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It is well-known that a small system weakly coupled to a large energy bath will, when the total 
system is in a microcanonical ensemble, find itself to be in an (approximately) thermal state (i.e. 
canonical ensemble) and, recently, it has been shown that, if the total state is, instead, a random 
pure state with energy in a narrow range, then the small system will still be approximately thermal 
with a high probability (defined by 'Haar measure' on the total Hilbert space). Here we ask what 
conditions are required for something resembling either/both of these 'traditional' and 'modern' 
thermality results to still hold when the system and energy bath are of comparable size. In Part 1, 
we show that, for given system and energy-bath densities of states, crs(e) and cre(e), thermality does 
("*") not hold in general, as we illustrate when as and <tb both increase as powers of energy, but that 

^s^j it does hold in certain approximate senses, in both traditional and modern frameworks, when as 

and <7b both grow as e be or as e qe (for constants 6 and g) and we calculate the system entropy in 
these cases. In their 'modern' version, our results rely on new quantities, which we introduce and 

Qcall the S and B 'modapprox' density operators, which are defined for any positively supported, 
monotonically increasing, ag and <tb, and which, we claim, will, with high probability, closely 
approximate the reduced density operators for the system and energy bath when the total state of 
system plus energy bath is a random pure state with energy in a narrow range. In Part 2 we clarify 
the meaning of these modapprox density operators and give arguments for our claim. 

The prime examples of non-small thermal systems are quantum black holes. Here and in two 
companion papers, we argue that current string-theoretic derivations of black hole entropy and 

S thermal properties are incomplete and, on the question of information loss, inconclusive. However, 
we argue that these deficiencies are remedied with a modified scenario which relies on the modern 
strand of our methods and results here and is based on our previous matter- gravity entanglement 
hypothesis. 
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£ I. INTRODUCTION 

o 

i ^ i A. Background 

This paper is concerned with the general question: "How do physical systems get to be hot?". By 'hot' here, we 
do not simply mean 'having lots of energy'. We shall reserve the word 'energetic' for that. Rather, we mean the 
more specialized notion of being in what is known, in (quantum) statistical mechanics, as a Gibbs state, i.e. a state 

^vq described by a density operator of form 

IT) 

where H is a suitable (usually, of necessity, approximate) Hamiltonian (assumed to have discrete spectrum) for the 
y—{ system and f3 is related to the system's temperature, T, by j3 = 1/kT where k is Boltzmann's constant (henceforth 
set to 1). Here Zp stands for tr(e~^ H ) and is the normalization constant which ensures that p Glbbs will have unit 
• i— i trace. (When regarded as a function of (3 it is, of course, the system's 'partition function'.) Such states are also 
known as 'canonical states' or 'thermal equilibrium states' or 'KMS states'. We shall sometimes refer to them simply 
as 'thermal' states. A possible source of confusion here is the fact that it is sometimes found to be convenient to 
adopt the fiction that a system which is merely energetic is in a Gibbs state at a temperature chosen so as to give 
it the same mean energy. Additionally, given a system with a density of states c(e), it can sometimes be convenient 
to assign to it a 'temperature', T(e), at each energy, e, according to the formula 1/T(e) = d\oga(e) / de [T]. We wish 
to underline that we shall not be concerned with such a fiction, nor with such an assignment of an energy-dependent 
'temperature', here. Rather we are interested in how systems get into states which are actually Gibbs states. In 
particular, we are interested in black bodies, and, more particularly, black holes (in suitable boxes; here we refer 
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to the remarkable developments in 'Euclidean Quantum Gravity' and in (Quantum) 'Black Hole Thermodynamics' 
which arose from Hawking's pioneering work [2 j on 'Black Hole Evaporation' - see e.g. the papers on quantum black 
holes in the collections HUH])- 

Of course, one way for a system to get into a Gibbs state is for it to be weakly coupled to a (much larger) heat bath 
which is already in a Gibbs state at the desired temperature. There is a considerable literature, which, with varying 
degrees of mathematical rigour and generality, shows that, as one might expect, a typical such system will, more or 
less irrespective of its initial state, approximately get into a Gibbs state at the same temperature at late times - see 
e.g. [5], [Hj- However, what we are really interested in when we ask our general question "How do physical systems 
get to be hot?" is: 

"How does any physical system ever get to be hot in the first place?" 

Obviously, an explanation of how one system gets to be hot which invokes the existence of another system (the 
above-mentioned heat bath) which is assumed already to be hot can't help to answer this version of our question! 

Another traditional explanation for the propensity of some systems to be in Gibbs states goes along the following 
lines (see e.g. [7] and, for a treatment of some of the related mathematical aspects, e.g. [5] as well as the paper [5] 
which recalls this traditional explanation as a preliminary to its main purpose - for which see below): One assumes 
one's system of interest, say described by a Hamiltonian, Hs, on a Hilbert space, Hs, to be weakly coupled to a much 
larger 'energy bath', with Hamiltonian, Hb, on a Hilbert space, Hb - both Hamiltonians being assumed to have a 
finite number of energy levels in any finite energy interval, with the number of states of the energy bath in an energy 
interval, 5, being approximately given in terms of a 'density of states', ctb, as aB(e)5 - ctb being assumed to have some 
typical, say, power-law form (see below) - and one assumes the whole system to be in a total microcanonical state. 
Before we explain what we mean by this, we pause to remark, first, that, in order to avoid ambiguous usages of the 
word 'system', we shall, from now on, adopt the word totem (short for 'total system') to denote what we referred to 
above as our 'whole system'. So we shall talk about a 'totem' which consists of a 'system'. 'S', and an 'energy bath', 
'B'. Our assumption of weak coupling is then the assumption that the totem Hamiltonian will take the form 

H = Hs (g> 1 + 1 (g) Hb + weak coupling term (2) 

on the totem Hilbert space, H = Hs <8> Hb , and we shall assume further that the coupling term is so weak that it can 
be neglected for state and energy-level counting purposes. To say that our totem is in a microcanonical state then 
means to assume it is described by the density operator 

Pmicroc = Af- 1 ^|e)(e| (3) 

on the totem Hilbert space, H, where the sum is over a basis of energy eigenstates for the subspace of H consisting 
of energy levels with energies in an interval, [E,E + A], which is small, yet large enough for the total number of 
totem energy eigenstates in this range to be very large, while the normalization constant, M (which is expected to 
roughly scale with A) is the total number of such basis eigenstates. We further pause to note that we shall assume 
throughout the present paper, as is usually assumed for 'ordinary' physical systems, that both Hamiltonians, Hs and 
Hb, are positive and their densities of states monotonically increasing. We remark though that, as we will discuss 
further in Section |VIII[ were any of these assumptions to be relaxed, then the prospects for systems to become hot 
become much less constrained and, in particular, there are ways in which a system can be hot while the totem is in 
a pure state which differ from the 'modern' scenarios we discuss below. 

Proceeding with the above assumptions, the states, |e), in the sum in ^ will each take the form |eg) (§5 |cb) and the 
sum over totem energy levels will become (see Q below) a double sum over system energy levels, eg, and energy-bath 
energy levels, £b , which satisfy the condition es + £b G [E, E + A}. The resulting state of the system is then represented 
mathematically, in the usual way, by the reduced density operator, p™ lcIOC on Hs i-e. by the partial trace of /9 m i croc 
over Hb ■ 

To remind ourselves how thermality of our system can then come about in this traditional explanation, it is 
instructive first to consider an oversimplified model in which our system Hilbert space, Hs, is two-dimensional with 
only two energy levels with energies e s and eg such that eg — e s 3> A and in which the density of states, ctq, of the 
energy bath grows exponentially - we shall write o~b (e) = ce be . (We shall discuss the case where both system and 
energy bath both have such a density of states in Sections III and|VJ) 



Then we easily see that p™ 10 ™ will be approximately 

fierce = n -l {cAe HE-*i) | e l ) (£ 1 | + cAe HE~e 2 ) ^ (f 2 |} 

where n denotes the appropriate normalization constant, and this is clearly the same as the Gibbs state 

Pe = zjHe- pei \4)<4\ + e- pe *\4)(4\) 
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for /? = b for a suitable, normalizing, Zp. 

In the full story, where we now assume that also the states of the system are approximately given by a density of 
states, as, it is convenient to assume that E is an integral multiple of A and locally to slightly distort the spectra of 
system and energy bath so that their energy levels are evenly spaced at intervals A, 2 A, . . . , E with each system level 
having degeneracy 

n s (e) = <r s {e)A (4) 

and each energy-bath level having degeneracy 

n B (e) = a B (e)A. (5) 

If, as we shall further assume, this can be done in such a way as to maintain the same 'smoothed out' densities of 
states, then it will not seriously alter the values of any quantities of interest. Choosing a basis within the degeneracy 
subspace of Ha, with each energy, e, and labelling its elements |e, £), where, for each e, £ = 1, . . . , 7ig(e) while e ranges 
from A to E in integer steps of A (and similarly for the energy bath) we then easily have that p m i croc ^ can be 
rewritten as 

Pmicroc = M" _1 X)X]X)X]l eS >*> l £ B,j)(eS,*| ® (eB,j| (6) 
es £ B i j 

where the sum over i goes from 1 to ng(es), the sum over j goes from 1 to n B (e B ) and the sums over es and e B are 
over values which are positive-integer multiples of A and are constrained to have eg + e B — E, while the normalization 
constant, M, defined after ([3]), is also given by 

E 

M = ns{e)n B {E - e) (7) 
or, roughly equivalently 14J, by making the replacement 

E rE 



by the approximate formula 



,_A JO 



de (8) 



M = A [ a s (e)a B (E- e)de. (9) 

Moreover, S/A times the summand in Q or SA times the integrand in ^ is, for suitable (small but not too small) 
5 (approximately) the number of energy eigenstates for which the energy of the totem lies in the interval [E, E + A] 
while the energy of the system lies in the interval [e, e + 5]. When our totem is in the microcanonical state jjjj, pj, 
this summand divided by M may thus be interpreted as the probability that the system energy lies in this latter 
interval. We shall denote it by Ps(e)<5 and call -Ps(e) the system's energy probability density so we have 

P S (e) = ^a s (e)a B (E - e) ~ ^n s (e)n B (i? - e), (10) 

and we notice, in passing, that 

P B (e) = P S (E - e). 
The reduced density operator, pig 1CTOC , of /CWroc on Hs will clearly be 

E n s (e) 

p ^oc = M -iJ2n B (E-e) J2 M)M- (11) 

e=A i=l 



(Here and below, to avoid cluttering up our formulae, we drop the 's' suffix on e - also in |e, i) - when there can be 
no ambiguity.) 
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One can then show, for a wide range of 'realistic' energy-bath models that, in the limit as the energy bath gets 
large while the system remains unchanged, pg llcroc will converge to a thermal state at an inverse temperature /3 given, 
[T5] . in terms of the large-size behaviour of the energy bath's density of states. 

In particular, and specializing now to a case (cf. again e.g. [7J) that will interest us further below, if the density of 
states, <jb, has the typical power-law form of ordinary (radiationless) matter: 

* B (e)=A B e N », (12) 

where A# is a constant and is an 'Avogadro-sized' number which could stand e.g. for '3/2 times the number of 
molecules' in the energy bath or the 'number of oscillators' in the energy bath (see again e.g. [7J for the origin of the 
3/2 etc.) etc. then, in the limit as the total energy, E, of the totem gets larger while the size of the energy bath gets 
larger - in the sense that N-q gets larger - while the system remains unaltered and N-q/E converges to a constant, /3, 



^microc w jjj converge to a thermal state at inverse temperature (3 - i.e. to the pg j 1 of Equation ( 13 ) below. In the 



special case that the system has a density of states also of power-law form (see (18)) we shall provide a proof of this 
result, in passing, in Section [II] below which is particularly instructive in relation to our present purposes. See the 
last paragraph in Section [TT| So, in this way, one shows that a small system in contact with a large energy bath with 
a suitable density of states will approximately be in a Gibbs state when the totem is in a microcanonical state. 

Above, a Gibbs state of our system will obviously take the form (assuming again the spectrum to be slightly 
distorted as explained before equation (J6|) 



„Gibbs 
PS,8 



z siE e ^ £ E mm ( 13 ) 

£= A i=l 

where (approximating the obvious sum by an integral as we did when we passed from Q to Q) 

Zs,p= / <J S (e)exp(~f3e)de. (14) 
Jo 

However, this traditional explanation of the origin of thermality (of a small system) is also unsatisfactory since it 
still begs the question of how the totem got into a microcanonical state. What would really be desirable would be an 
explanation of the origin of thermality consistent with the basic assumption of standard quantum mechanics that the 
total state of a closed system (in our case, our totem) is a pure state - i.e. in the language of density operators, the 
projector, |^)(^'|, onto a single vector, Vt, in the closed system's (/our totem's) Hilbert space. 

Such an explanation has, in fact, recently been given by a number of authors again for the case of a small system in 
contact with a large energy bath. See especially the paper [9] entitled 'Canonical Typicality' by Goldstein, Lebowitz 
et al. and also the references therein. The result of that paper - when specialized to our power-law density of states 



model (12 1 - amounts to the statement that if, for a 'system' and 'energy bath' as considered above, one takes a 
random pure state with energy in the energy range [E, E + A], then, again imagining the energy bath to get larger 
while Nb/E converges to /3, for sufficiently large E, the reduced density operator of the system, pg lodcrn , will, with 



very high probability, be very close to a Gibbs state (i.e. the Pg^ hs of (13)) at inverse temperature (3. 



oTour main new results in Section ID 



We shall also re-obtain this result ourselves as a limiting case of one 
The precise mathematical statement can be inferred by inspecting the paper [9] and/or see the more general rigorous 
result proved by Popescu et al. [T2"] . 

Goldstein, Lebowitz et al. define what they mean here by 'random' and by 'probability' by taking the natural 
measure on the set of unit vectors of the relevant Hilbert space - assumed to have large, but finite, dimension M - 
by thinking of it as a (2M — l)-dimcnsional real unit sphere and taking the natural invariant measure induced on 
that by Haar measure on the orthogonal group. In doing so, they follow pioneering work of Lubkin [10] who, in 1978, 
after introducing |11) this use of this measure (following Lubkin and subsequent authors, we shall simply call it 'Haar' 
measure from now on) showed that a randomly chosen pure density operator, p mn — |4 r )(^ r | (without any restriction 
on energy or anything else) on the tensor-product Hilbert space, H m ®H n , of a pair of quantum systems - % m being 
TO-dimensional and H„ being n-dimensional - will, for fixed m and n 3> m, have, with high probability, a reduced 
density operator, p™", on T-L m , which is close to the maximally mixed density operator - with components, in any 
Hilbert space basis, diag(l/m, . . . , 1/m). We shall discuss further this result of Lubkin and some related developments 
in Section [X] at the beginning of Part 2 since they will be needed as a preliminary towards our argument for Equation 



(15) and the related claimed proposition in Section ID 



In essence, one might characterize the relation between Lubkin's work and the work, [9], of Goldstein, Lebowitz et 
al. by saying that Lubkin obtained microcanonicality of a small subsystem from randomness of a totem pure state 
while Goldstein, Lebowitz et al. obtained canonicality of a small subsystem when an, otherwise random, totem pure 
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state is constrained to have a definite energy. (Popescu et al. [T3] then generalized these developments by allowing 
for more general constraints, and also made them mathematically rigorous.) 

The modern (see Endnote [13]) results, [Him], of Goldstein, Lebowitz et al. and of Popescu et al. are an advance 
on the traditional results in that they replace the assumption of a total microcanonical state by the assumption of 
a total pure state. However, they still share the limitation of the traditional approach of still only being capable of 
explaining how, at most, only a small subsystem of a given 'large' totem can get to be (approximately) thermal. The 
main purpose of the present paper will be to explore to what extent, and/or under what altered circumstances, this 
limitation can be overcome. Our main motivation relates to the theory of quantum black holes. Black holes are a 
puzzle in relation to the above results if one believes, as seems compelling, that the totem consisting of a black hole in 
equilibrium with its atmosphere in a box at approximately fixed energy is completely (approximately) thermal |16j . 



B. Quantum black holes 

In such black hole equilibrium states we may roughly (albeit not exactly, see Endnote (iii) in [T5]) identify the black 
hole itself with 'gravity' and the atmosphere with 'matter'. In an earlier proposal (see [TTJ [T5] and especially Endnotes 
(i), (ii), (iii) and (v) in [!§]) of the author (which predated the work [H[T2] in a more general, but non-gravitational 
[25] . context of Goldstein, Lebowitz et al. and of Popescu et al. by around seven years) a radically-different-from- 
usual hypothesis was put forward as to the nature of quantum black hole equilibrium states according to which the 
total state is a pure state (in line with what we are calling here the 'modern' approach - see Endnote [13j - but in 
contrast to the usual assumption in work on quantum black holes that it is a Gibbs state at the Hawking temperature) 
while the reduced state of the gravitational field alone and also the reduced state of the matter fields alone are each 
thermal (i.e. each Gibbs states) at the appropriate Hawking temperature (see below). (Here we use the word 'matter' 
to include e.g. the electromagnetic field.) Below, we shall sometimes call such a total pure state bithermal. This 
hypothesis formed, in turn, just a part of our wider hypothesis |17H19j (which we shall sometimes refer to here as our 
matter- gravity entanglement hypothesis) according to which, quite generally, one should always take into account the 
quantum gravitational field as well as all matter fields in describing the full dynamics of any physically closed totem, 
and that, while the state of the totem is always pure and evolves unitarily, the 'physically relevant' quantum state 



is to be identified with the reduced density operator of the matter alone and, concomitantly (see Section IE and, in 
particular, Endnote [48 ), the physical entropy of a closed totem is to be identified with its matter- gravity entanglement 
entropy. Interpreted according to this wider hypothesis, our hypothesis that quantum black hole equilibrium states 
are bithermal then implies that, physically, such states are completely thermal. We remark that, given our wider 
hypothesis, what is required for this complete thermality is, of course, just thermality of the reduced state of the 
matter. However, there are strong reasons (particularly the fact [4] that the Euclideanized Schwarzschild metric is 
periodic in imaginary time with period 8irGM) for believing that the mathematical nature of the reduced state of 
gravity will also be thermal and this is what we have assumed above and will continue to assume in the remainder of 
this section and in Section llXl 

To summarize and also to recall the relevant formulae: While we accept the (conventional) belief that, in black 
hole equilibria, both matter and gravity are each separately thermal at the Hawking temperature, Th, we propose 
(unconventionally in comparison to other work on quantum black holes) that the total state of matter-gravity is pure 
(rather than itself being a thermal state). The thermality of each of the reduced states (i.e. of matter and of gravity 
separately) will then arise as the result of entanglement between matter and gravity in the pure totem state. We shall 
refer to this picture of black hole equilibrium states as our entanglement picture of black hole equilibrium. (We shall 



assume in Section IX and in [22, 2.3J that, in this picture, the overall (i.e. totem) state of black hole equilibrium is not 
only pure but also close to an energy eigenstate.) We further emphasize that while this proposal is unconventional 
when compared to other work on quantum black holes, it seems to fit well with modern approaches (such as those 
of [9] 112) ) towards understanding the origin of thermality which have recently been proposed in non-gravitational 
contexts. Here, we recall that the Hawking temperature, Th, is given [2 H], in the case of a Schwarzschild (i.e. 
spherical, uncharged) black hole of mass M., by Th = l/8irGM (in general the surface gravity multiplied by 27f). 
Here, G denotes Newton's constant and we set c and h to 1. Moreover, we accept the conventional belief that the 
physical entropy - again in the spherical, uncharged case - has the Hawking value of AttGM 2 (in general, one quarter 
of the area of the event horizon, divided by G) and what is new about our proposal is our claim that this entropy- 
value should ultimately be explainable as the matter-gravity entanglement entropy of a pure state of the overall 
matter-gravity totem. 

Finally, we note that our matter gravity entanglement hypothesis and our entanglement picture of black hole 
equilibrium also offer a natural resolution to the Information Loss Puzzle [20]. This puzzle arose because, as long 
as it was believed that black holes were correctly described by mixed states, then, in a dynamical process in which 
black holes were formed from collapsing stars etc., it appeared that an initial pure state would dynamically evolve 



G 



into a mixed state, contradicting unitarity. On the other hand, there is no difficulty in reconciling our matter-gravity 
entanglement hypthesis with a unitary quantum mechanical time evolution and, once we identify entropy as matter- 
gravity entanglement entropy, this is entirely consistent with increasing entropy (i.e. information loss). We note that 
this proposed resolution to the Information Loss Puzzle is, in fact, just a special case of our proposed resolution to 
the Second Law Puzzle [T71 |T51 . 



C. Our specific question 

The specific question we shall endeavour to answer in this paper assumes, as its basic setting, that a totem be given 
which consists of a pair of weakly coupled systems, S and B, each with its own Hilbert space, "Hs and Hb, and each 
with its own density of states, erg and 03- 

Our specific question is then: 

If the systems, S and B, are of comparable size |24| , what modifications need to be made either to the traditional 
'total microcanonical state ' approach or, more relevantly since we believe it to be a step closer to the right answer, to 
the more modern 'total pure state ' approach of Goldstein, Lebowitz et al. and of Popescu et al. and others, as described 
above, so as to ensure that when the totem has a total state with energy in an interval [E, E + A], the reduced states 
of S and B will each likely be approximately thermal states? (and, in particular, in the 'total-pure state approach', the 
total state will likely be approximately bithermal). 

(What is meant here by 'comparable size' has, of course, to be encoded into the functional form of the densities of 
states crg(e) and ctb(£)- How this is done will be clear from the specific examples we discuss.) 

We hope the answers we obtain below may be of interest in their own right and that the formalism we deploy to 
answer them may find a variety of other applications. But the immediate application we have in mind is to the theory 



of quantum black holes. In Section IX and in our two companion papers, |22U23j . we shall argue that our answers help 
to strengthen the case for, and give concrete form to, our matter-gravity entanglement hypothesis and particularly 
our entanglement picture of black hole equilibrium discussed in Section |I B| 



D. Answers 



The key to answering our specific question, in the 'traditional total microcanonical state' approach is the formula 

microc ,onS. 



(Ill which we already gave above for the reduced density operator, p§ 
We claim that the appropriate replacement for this formula in the 'modern total-pure state approach' is 



modapprox 

Ps 



M- 1 ( ^n B (£ 



ns(e) 



e=E +A 



n B (E-e) 



(15) 



On the right hand side of this equation, we continue to assume the spectrum to be slightly distorted in the way we 
explained before equation (|6| , n§ and tib to be defined as in Q and ^ and the sums to be over integral multiples of 
A, and we also continue to assume, as will be the case in our examples in Part 1, that as and ob are monotonically 
increasing functions - defining E c to be the energy value at which o~s(E c ) = o~b(E — E c ). When e > E c , the |e, i) then 
denote the elements of an orthonormal basis of an n-Q {E — e)-dimensional subspace of the (ns (e)-dimensional) energy-e 
subspace of Hg which will depend on \P. As we shall see, this dependence on VP will not matter for the developments 
in Part 1. We will postpone a full explanation of the way in which the subspace depends on \E' to Section |XII| in 
Part 2. 

It is important to notice that, as is easy to check, the constant, M, by which one needs to divide in order to 
normalize ( 15 ) has the same value, given by ^ and ^ (and as explained after those equations, equal to the total 
number of states of the totem with energy in the interval [E, E + A]) as the constant, M, by which one needs to divide 

are clearly (usually) very different, 



in order to normalize (|11[). Moreover, while the states, pg llcroc and p s 



modapprox 



both states share the same energy probability density, Ps(e) (10 1. (There is of course a similar pair of equations to 
(111 and ( 15 ) with obvious reversals of the letters 'S' and 'B' and, in the case of ( 15 1, with E c replaced by E — E c .) 



We now claim that the sense in which (15) is the appropriate replacement for (11) in the modern approach is then 



made clear by the following proposition, our argument for the correctness of which is given in (and is the main purpose 
of) Part 2: 

Proposition. |21j For a given, randomly chosen, pure state, ty, on the Hilbert space of our totem, with energy 
restricted to be in the range [E,E + A], the reduced density operator, p modorn f the system may, as far as physical 
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quantities of interest are concerned, with very high probability, be considered to be very close to the j0 ™ oda PP rox / ( 15 ) 
for the appropriate (i.e. to the chosen vector ty) ub(E — e) -dimensional subspaces of Us spanned by the \e,i) (see 
above and Part 2). (And a similar statement of course holds with system, S, replaced by bath, 

What makes this proposition particularly useful is the fact that, while the hb(E— e)-dimcnsi onal s ubspaces (spanned 
by the |e, i)) of Us will depend on the choice of \& (in a way which we shall explain in Section 



XII 



in Part 2 where we 

point out, by the way, that they might themselves be said to be 'random subspaces') as is easy to see and as we shall 
illustrate in Part 1, th e va lues of physical quan titie s of interest, such as the mean energy and the von Neumann entropy 
of the system S (see (16) below and Section IE) calculated using p™ oda PP rox ; do not depend on which ub(E — e)- 
dimensional subspaces they are. Therefore we can conclude that, to the extent that the approximation of pj? odcrn by 
^modapprox j g g 00c } ( an d we shall argue in Part 2 that, in our situations of interest, and when it is used for the purpose 
of calculating mean energy and entropy, it is very good) the actual values of these quantities must (with a very high 
probability) be largely independent of the choice of (Aside from mean energy, in fact we expect the entire energy 
probability density function, Ps{e), will most likely be close to that of /0 ™ oda PP rox anc j hence also, similarly, for higher 
moments of the energy.) 

Above, we recall that, for an arbitrary density operator, p, the von Neumann entropy is given by the formula 



S(p) = -tr(plogp). 



(16) 



We remark that it is easy to see from a comparison between (11) and (15) that the above proposition implies the 
'Canonical Typicality' result [9| of Goldstein, Lebowitz et al., thus fulfilling our promise in Section|l]to re-obtain the 
latter. For, in the relevant limit (see after Equation (12)) E c in (15) will tend to E and therefore p™ oda PP rox |i5[ ) w jH 
tend to pg llcroc (11) which, in turn, will tend, by the traditional argument we reviewed in Section |l| to a Gibbs state 



(namely the pg^^of (13) for /3 equal to the limiting value of N-q/E). 

To start now to address our specific question, we first observe that, whatever the densities of states, as and <tb 
(provided only they are monotonically i ncre asing) as long as the total energy, E, of our totem is finite, then, of course 
neither of the density operators, p^ lcroc (|TTJ) and 



^modapprox can ^ e exactly thermal. To see this easily, it suffices 



to notice that the energy probability density, Ps(e) (10), which these states share will obviously be zero for e > E, 

— Jibl 



whereas, when crg(e) is sufficiently slowly growing for7^g 1 l' bs (see (13)) to exist, the energy probability density for the 



Gibbs state ( 13 ) will obviously take the form 



P s ^ bbs (e) = ^> s ( £ )exp(-/36), 



(17) 



where Zs,/3 is as in (14), which (for a rising density of states as) will be non-zero for all e. However, one can ask 
whether yOg 11 " 00 and/or p™ oda PP rox can ]-, e approximately thermal, say at sufficiently low energies. 



We shall find that, for physically ordinary densities of states such as (cf. the discussion around (12)) 

trs(e) = A s e Ns , a B (e) = A B e 



(18) 



i.e. when A^s and N B are both large, but 



nor j0 ™ oda PP rox can even be approximately thermal. In particular, 



then, when system, S, and energy bath, B, are large and of comparable size 
comparably sized numbers - then neither p™ 1 

this is the case when both system and energy-bath densities of states are identical (i.e. when As = A B and As = Al, 
Rather, we will show that, when system and energy bath are of comparable size (or identical) in the sense just 
explained, the energy probability density of both S and B will, instead of having the behaviour one would expect 
of a thermal state, deviate from the most likely distribution of energies between S and B according to a Gaussian 
probability distribution with width of the order of E divided by the square root of As (equivalently Ab). 

On the other hand, we shall show that in certain well-defined senses, 'approximately thermal' states are obtained 
for system, S, and energy bath, B, both on the traditional total microcanonical state approach and also on the modern 
total pure state approach if they both have identical densities of states which either rise exponentially with energy or 
rise as 'quadratic exponentials' - i.e. each as the exponential of a constant times the square of the energy - the notion 
of 'approximately thermal' depending both on the approach (i.e. the traditional total microcanonical state approach 
or the modern total pure state approach) and also on the behaviour of the densities of states (i.e. on whether they 
rise as the exponential of energy or of energy squared). See especially the notions of '^-approximately thermal' and 
'^-approximately semi-thermal' introduced in Section |III| for the case of an exponentially rising density of states. 
(The extent to which these results generalize to non-identical densities of states is briefly discussed for the exponential 
case in Endnote [29] to Section III ) 



E. Results on the origin of entropy 

Although it is not indicated in our title, besides our main question concerning the origin of thermality, we shall 
be greatly concerned throughout the paper, with the origin of entropy. And we are particularly interested in under- 
standing how the very large entropies of black holes come about. 

To this end, we will obtain formulae (Equations (54 1, |55| ) in Section |V| and Equations ( 69 1 and ( 70 1 in Section 



VI) for the entropy of our system, S, on both traditional and modern approaches, when system and energy bath 
both have either identical exponential or identical quadratic exponential densities of states. (We will also obtain 
formulae for the mean energy of S and B.) In the traditional approach, this is simply the mean entropy of the reduced 
density operator of the system when the totem is in a microcanonical state with given energy, E. In the modern 
approach, we remark, first, that, for every pure totem state, whether or not S and B have identical densities of 
states, the system entropy is necessarily always equal to the energy-bath entropy and both of these quantities are, in 
fact, identical 48 with the {system}-{energy bath} entanglement entropy. Second, the value of the entropy in the 
modern case is to be interpreted, in the light of our proposition, as the value that the system entropy (= energy-bath 
entropy = {system}-{energy bath} entanglement entropy) of a randomly chosen totem pure state will, with very high 
probability, be very close to. One of the most significant of our overall conclusions, dependent on our proposition, 
which we argue for in Part 2, is the fact that there is such a value at all - i.e. the fact that, with our basic general 
assumptions and for system and energy-bath densities of states of the sorts we discuss, the vast majority of totem 
states will have a system entropy close to one single value, namely — tr(p™ odapprox log(pg lodapplox )). In terms of the 
language of Quantum Information Theory, this may be stated in the following way (below we temporarily suspend 
our terminological conventions, calling both S and B 'systems' and our 'totem' the 'total system'): 
Given two comparably- sized large systems, (S and B ), which are either uncoupled or weakly coupled, then (for physically 
reasonable densities of states and even some maybe physically unreasonable ones) if their total state is a random pure 
state, their degree of entanglement ( as measured by their entanglement entropy, S ) will, with high probability, be close 

, ,7 -i i , / modapprox i / modapprox\\ 

to the single value — tr(p s KK log(p g )). 

(Similarly, we expect that the mean value of the energy of the system, S , will, with high probability, be close to the 
single value — tr(p™ odapplox i?s)- Indeed we expect the full energy probability density function, Ps(t) [and hence also 
other moments of the energy], of S to, be, with high probability, close to that of p™ oda PP rox | arl( f similarly with S 
replaced by BJ.) 

Our results are that, for a totem with total energy E, for identical exponentially rising densities of states, o~s(e) = 
cs(e) = ce be , on the traditional approach, the entropy, S™ lcroc , w iH be bE/2 (up to a logarithmic correction) while, 
on the modern approach, the entropy (i.e. the single value as discussed in the previous paragraph) 5 , ™ oda PP rox j w [\\ De 
bE/4: (up to a logarithmic correction). For identical quadratic exponential densities of states, cs(e) = 013(e) = Ke qe , 
we find that 5'™ lcroc = qE 2 /2 (up to a correction of order 1 in E), while 5'™ oda PP rox w [\\ be tiny (i.e. a term of order 
1 in E). (In both traditional and modern cases and with both equal exponential and equal quadratic exponential 
densities of states the mean energy of both system and energy bath will, of course be E/2 - in the modern case, 'mean 
energy' here meaning the value, — tr(pg 10 a PP rox 7j g ^ that the mean energy of a random pure totem state will most 
likely be very close to.) 



F. Outline of the rest of the paper 



We shall give full details of the results outlined in Section I D in Part 1, the main sections of which comprise Section 



|nj which discusses the case where the density of states of both system and energy bath goes as a power of the energy, 
Section s |III| and \V\ which discuss the exponential case, and Section [VI] which discusses the quadratic exponential case. 
Section |IV| develops the mathematical formalism to enable efficient computation of the expected energy and entropy 
of system, S, and energy bath, B, for the states p s nlcroc and j0 1 ° oda PP rox anc j ^bis formalism is applied in Sections |v| and 
|VI| to obtain formulae for these quantities in the cases of exponential and quadratic exponential densities of states. 

Two further sections, VII and VIII discuss some further related matters and can be skipped on a first reading. 
Section VII discusses the special features of the entropy, in both modern and microcanonical cases, when the densities 
of states of system and energy bath are such that the energy probability density ( 10 ) is sharply peaked (as is, for 
example, the case for our power law densities of states) and derives some general formulae which enable us, e.g. to 
calculate the entropy for the states considered in Section [IT] In passing, we clarify the relation with some traditional 
work on the microcanonical ensemble (where peaks are normally presupposed) and dispel some myths. We also discuss 
the connection between the sum of the entropies of the partial states of system and energy bath with the totem entropy 
log(Af ). In Section VIII we point out that if some of our basic assumptions are relaxed, then the prospects for systems 
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to become hot become much less constrained and, in particular, there are ways in which a system can be hot while 
the totem is in a pure state which differ from the 'modern' scenarios we discuss below. In particular, we discuss the 
notion of 'purification' (closely related to 'thermofield dynamics'). 

The entropy formulae we obtain in Sections [V] and VI (as outlined at the end of Section I E ) will play an 
important role in Section |IX| and in two companion papers [22J 123) where we discuss the application of the ideas and 
formulae of these sections to the theory of quantum black holes. In Section flX A[ we point out an intriguing resemblance 
between our entropy and temperature formulae for quadratic exponential densities of states in the microcanonical 



strand of Section VI with Hawking's energy and temperature formulae for (Schwarzschild) quantum black holes and 



point out an apparent lack of success for the modern strand of Section |VI| in modelling black holes. However, we 
argue that it is difficult to conclude anything decisive from these observations since (at least in a description in terms 
of a quantized Einsteinian metric) black holes presumably do not satisfy the basic assumptions underlying our results 
here - in particular our assumption (see Equation ([2])) of weak coupling. 

What seems more promising is a connection between the formulae and results for entropy and temperature which 
we obtain in Sections |III| and [V] for exponentially growing densities of states and scenarios in which quantum black 



holes are viewed as strong string-coupling limits of certain states of weakly coupled strings. In Section |IX B| and in 
our two companion papers, |22| and |23| . we recall some of the existing work |38H41| in this direction, and point out 
that, despite its great computational success, what is computed in this work is the degeneracy of certain black hole 
states; the fact that the resulting degeneracy formulae happen to agree with the previously known values of black hole 
entropy does not seem to have been explained hitherto. We then go on to propose a modification of the existing string 
theory scenario, and in particular of the work of Susskind [35] and Horowitz and Polchinski [1PJ HI] based on the 
modern strand of the present paper and on our matter-gravity entanglement hypothesis and our entanglement picture 
of black hole equilibrium (see Section IB ) . We argue that this modified scenario, which is based on an understanding 



of black hole equilibrium states as strong string-coupling limits of equilibria involving a long string coupled to a 
stringy atmosphere, does offer an explanation of black hole entropy and thereby also a satisfactory resolution to the 
Information Loss Puzzle. The companion paper [22] gives a brief announcement of the main results of the present 
paper with a focus on the main results and formalism, as well as discussing further our matter-gravity entanglement 
hypothesis and outlining the application of that, with the results of Sections |III| and |V} to this string scenario. The 
further companion paper |23j develops the string scenario further. 
Part 2, which comprises Sections [X XI 



XII 



and XIII 



clarifies the meaning of Equation (151 and presents our 



arguments in favour of our proposition in Section |ID| A fuller description of the contents of Part 2 is given towards 
the end of Section IXl 



Part 1: Results for power law, exponential and quadratic exponential (equal) densities 

of states 



II. 



POWER-LAW DENSITIES OF STATES 



If S and B have densities of states as in ( 18 ) then, by ([£]) and the remarks in the subsequent paragraph, we have 
that M, i.e. the total number of totem states with energy in [E, E + A], is given by 



M = A S A B A e" s (E~e)"*de 
Jo 



(19) 



which can be rewritten 

M = A s A b AE Ns+Nb+1 B(N s + 1, N B + 1) (20) 
where B(x,y) is the usual beta function (see e.g. [3D]) - related to the gamma and factorial functions by 



B(x+l,y + l) = 



xlyl 



r(a; + l)r(y + 1) _ 

T(x + y + 2) ~ (x + y)\{x + y + l)' 



(21) 



(For fractional arguments, we take x\ to mean Y[x + 1).) On the other hand, the number of such totem states with 
system energy in an interval [e, e + 5] will, for suitable 5, be well-approximated by 



P s (e)8 = A s A B 5AM- 1 e tfa (E- e) 
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A 8 A B 6AM- 1 E»*+ lf » (^) NS (l - ^) NB 



Thus, combining (20), (22) and (21) we have that 



N s + Nb + 1 
E 



where (see e.g. [2T>] ) 



b(N s ;N s +N B ,-) 



(22) 



(23) 



b(k;n,p) 



k\{n-k)\ 



p k (l-p) 



n—k 



(24) 



is, when n and k are integers, the binomial distribution function which has the famous interpretation as the probability 
that n 'Bernoulli' trials, each with probability p for success and q = 1 — p for failure, result in k successes and n — k 
failures. In order to take advantage of the insight afforded by this connection with probability theory we shall (with 
negligible error when Ns and Nb are large) assume from now on that Ng and JVb, if not already integers, are replaced 
by their nearest integers. 

First we notice that we may use the well-known connection between the binomial and the Poisson distribution to 
give an alternative derivation of the fact that, in the limit as E and N B grow while Ng remains constant and the ratio 
Nb/E converges to (3, S's energy probability density, Ps(e) (23), converges to the Gibbs energy probability density 



Pg ! o bbs (e) (see (17) and (14)) with inverse temperature (3 = N B /E for a s (e) as in (18) - the latter Gibbs energy 



probability density being given explicitly by 



pGibbs 

s,p 



(25) 



as one sees from (17) and (18) after easily checking from (14) and (18) that 

-i W 



/3N S 



+ 1 ' 



This convergence result is of course a special case (i.e. the case where S, as well as B, has a power-law density of 
states) of an easy corollary both of the traditional thermality result (on the total microcanonical state approach) and 



(bearing in mind the equality of the energy probability density for both (11) and (15)) of the 'Canonical Typicality' 
result of Goldstein Lebowitz et al. (on the 'modern' total pure state approach) which, as we discussed in Section [Tj 
both hold in the same limit; we shall see shortly that the alternative proof which we next give for this corollary easily 
implies an alternative proof to the traditional thermality result itself and thus also, by a remark we made in Section 
ID to an alternative argument for 'Canonical Typicality' when the system and energy-bath densities of states both 
have power-law form. 

As Feller puts it in [26 , "If n is large and p is small, whereas the product A = np is of moderate magnitude" then 
the binomial distribution goes over to the Poisson distribution, i.e. 



b(k;n,p) 



A* 



-A 



(26) 



In particular (cf. e.g. [27]) for fixed k, the right hand side of (26) is the limit of b{n; k,p) as n — ¥ oo while p —> in 
such a way that np — > A. From this, and (23), we easily concl 



ude that the limit, as E 



oo while Nb/E 
af a large energj 

energy probability density of S goes over'To'the energy probability density of the appropriate Gibbs state; 



/3 with 



iVs fixed, of -Ps(e) is equal to Pg^ (e) (25). So, to summarize, in the appropriate limit of a large energy bath, the 



pGibbs/ \ _ 

^s,p \ e ) — 



pN B +l e N Se - 



Pe 



(27) 



We remark that, by inspecting (17) and (14) this is easily seen to be equivalent to the statement that, in the same 
limit, 

M -1 n B (-E-e) ->■ Z^e- pe 



and, by inspecting (11) and (13), one easily sees that this implies that in the same limit 



nncroc 

Ps 



pGibbs 

Ps,p 
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thus providing the alternative proof, which we promised in Section [TJ of the traditional result on the thermality of a 
small system in contact with an energy bath in the traditional limit of a large energy bath, in the case where both 
the energy bath, B, and system, S, have a power-law density of states (and by our remark in Section ID thus also 
providing an alternative proof of 'Canonical Typicality' for such S and B). 

We will now demonstrate, however, that, when S and B are of comparable size, then, if they both have power-law 
densities of states as in (18), both the total microcanonical state approach and the 'modern' approach (i.e. with a 
total pure state) predict that the reduced density operators of each of S and B will be quite different from thermal! 
We shall show t his b y showing that the energy probability density of each of S and B (which we again recall from the 
paragraph after ( 15 1 is the same in each approach) will have a quite different form from the thermal form of Pg£ bbs (e). 

First we notice that, when k is a fixed fraction, pn, of n (in such a way that < p < 1 and also pn is an integer) 
then, if p and n are regarded as fixed, the binomial distribution function (24) b{pn;n,p') is maximized when p' = p 
and we easily obtain the approximation [28] (now writing p' = p + x) 



b(pn; n,p + x) 



exp 



2p(l-p) 



(28) 



( 28 1 is obtained by expressing the left hand side in terms of factorials and powers according to ( 24 ) 



Stirling's approximation, N\ 



-N 



for each of the factorials and, introducing q = 1 



We then adopt 
p, write the term 



(p + x) np (l - p — x) n ^- 1 ~ p ' ) as p n Pq n i times (1 + x/p) np (l — x/q) nq and approximate the latter by exp(— nx 2 /2pq). 
Clearly, as long as n is extremely large and p is not extremely close to zero or 1, then this will be an excellent 
approximation. 

Combining (23) with the definition of p before (28) we see that, if we identify n with N$ + N B , then 



Ps(e) 



n + 1 
E 



(pn; n, 



where 



N S + N B 



(29) 



and that, provided n is extremely large and S and B are of 'comparable size', which of course, in view of (29), 



corresponds exactly to p not being extremely close to zero or 1, then, by (28), to a high degree of accuracy, we will 
have the approximation 



1 

E V 7T 



exp 



"7- 



(e-6 ) ; 
E 2 



i.e. a Gaussian with a peak located (See Section VII A for an alternative perspective on Equation (31)) at 



e = pE {p as in (29)) 



N s 



and 



N s + N B 



-E 



(30) 



(31) 
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ii 



2p(l-p) 



{Ns + N B ) 3 
2N S N B 



(32) 



and there will of course be an obvious counterpart formula for the energy probability density, Pb, of B, similar to the 
above formula but with p replaced by 1 — p. (This of course changes the value of eo but not of 7.) So the energy of 
S will be in a Gaussian band around a most likely energy of eo , the energy of B will be in a Gaussian band around a 
most likely energy of E — eo, each having the same width which will be E divided by a number (i.e. \/2j) which is of 
the order of the square root of either of the (comparable!) numbers Ns, N B . Moreover it is easy to see that, in both 
the traditional microcanonical and the modern total pure state approaches, the two energy probability densities will 
be perfectly anticorrclatcd - i.e. when S has energy in a small interval around e, then B with have energy in a similar 
small interval around E — e. 

Above, by 'width' we mean the standard deviation, s, from the mean of the energy probability density, i.e. 
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FIG. 1: Plot of the energy probability density (231, Ps(e), in the case S and B have the same density of states <j(e) 
the ('unusually' small) value N = 10 



Ae N for 



where 



e n P S (e)de 



(33) 



(= tr(p§"«'° c ff s ") = tr(p! 
(In the special case where TVg 



modapprox 
S 



cf. Section 



IV) 



ATb = N say, one sees that 7 becomes 4iV so the width, s, will be E /2\/2N .) As we 
anticipated, this is a qualitatively very different behaviour from the energy probability density of thermal states and 
we conclude therefore, as promised, that, in both the traditional total microcanonical and the modern total pure state 
approaches, the reduced density operators of S and B must, when, S and B are of comparable size, be quite different 
in character from thermal density operators. To illustrate this point, we include a figure (Figure [TJ for the energy 
probability density, i"s(e), in the case S and B have the same density of states <r(e) = Ae N for the (unrealistically 
small) value N — 10 and a comparison figure, Figured showing the the energy probability density, fg 3 g bbs (e) for a 
thermal state at the inverse temperature, j3 = 2(N + 1)/E, chosen so that the mean energy takes the same value, E/2 
- again in the case N = 10. 

For the sake of a quantitative result, we note that, for general N, the width, 5, of the energy probability density, 
P s G * bs (e), of this comparison thermal state is (as is easily calculated) E/2y / N + 1 - i.e. (to a very good approximation 
for large N) a factor of y2 wider than the width of -Ps(e) while the height is (again by an easy calculation) a factor 
of v2 smaller. 



We shall postpone to Section VII a calculation of the (microcanonical and modern) entropies of S and B for general 
Ns and Nb- Suffice it to remark that, like the width, s, the microcanical entropy of S, differs, in general, from its 
value in the comparison thermal state at inverse temperature /? = 2(N + l)/E, albeit the difference is just a 'small' 
constant (it is smaller by ~ log 2/2) independent of e. 

Finally, we remark that, in this power-law density-of-states case, it is clear from the developments in this section 
that the 'canonical' (i.e. thermal) behaviour of p™ 1 



(or indeed of p™ oda PP rox ) m th e case that the system, S, 



very much smaller than the energy bath, B, may be reconciled with the above-discussed Gaussian behaviour, when S 
and B are of comparable size, in that the relationship between the two may be regarded as an instance of the well- 
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FIG. 2: Plot of the energy probability density, Ps^ bbs (e) for the thermal state at inverse temperature, /3, on our system, S, 
with density of states a{e) = Ae N , for the same ('unusually' small) value N — 10 and for /? = 22/E (i.e. the value of f3 for 
which the mean energy is E/2). To be contrasted with the Ps(e) of Figure [l] 

known relationship (see e.g. |26j or |27j ) between the Poisson and Gaussian distributions in probability theory. (This 
obviously easily follows from the way we derived, above, both the canonical behaviour and the Gaussian behaviour 
as limits of the binomial distribution.) 



III. 



EXPONENTIALLY RISING DENSITIES OF STATES 



We now turn to discuss the quite different behaviour of the reduced density operators p™ 101 ' 00 and p™ oda PP rox w hen 
the densities of states of S and B increase exponentially. We shall confine our interest here to the case where both 
densities of states, as and <tb, behave as ce be with the same constants c and b in each expression: 



os(e) 



cb(<0 



J" 



(34) 



We remark, however, that, as may quite easily be checked, allowing different values of c (say eg in the first formula 
and Cb in the second) will not essentially change our conclusions [2"5] . 

The normalization constant M is now easily seen - either by using Q or, on recalling Q, by using ^ - to be 
given by 



M = c 2 e bE EA. 

We note that this will be large provided neither c nor A are 'too small' and provided also 

&£> 1 

which will hold in cases of interest. 



(35) 



(36) 



The formula, (111 for pg llcroc is then easily seen to coincide with the formula, HM, for a thermal density operator 
/°S/3 bS ' f° r tne density of states er g (e) as in (34 1 at inverse temperature (3 — b, provided the latter formula is modified 
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FIG. 3: Plot of the energy probability density (37l, Ps{e), in the case S and B have the same density of states er(e) = ce 



so that the sum over e is truncated at the upper energy, E and the partition function, Z$.p, is replaced by cE. Of 
course, the un-truncated formula ( 13 1 will only make mathematical sense for f3 > b. Nevertheless, the reduced density 
operator pg llcroc (and similarly also pg lcroc ) clearly deserves to be called an approximately thermal state at inverse 
temperature b. (This will generalize from equal systems to comparably sized systems if, by this, we mean systems 
with densities of states with unequal cs and cb as discussed in Endnote (25]). We shall refer to the relevant notion of 
being approximately thermal here as being E -approximately thermal. 

Turning from the traditional total microcanonical state approach to the modern total pure state approach, we see, 



on substituting (34) into (15) and noting that E c will obviously become E/2, that the e-summand in (15) still agrees 



with the e-su mmand in (13) at inverse temperature /3 = b up to energy E/2 and, moreover, as always (cf. after 
Equation (15)) the system energy probability density of /9 ™ oda PP rox j s e q ua l to that of p™ Icroc and thus it agrees with 



the energy probability density of a Gibbs state, for the same density of states, up to energy E. We shall refer to the 
relevant notion of being approximately thermal here (i.e. agreement of the summand in the formula ( 15 ) with the 
summand in the formula (13) up to e = E/2 — with a suitable change in the value of Z$,p - and agreement of the 



energy probability density up to E) as being E- approximately semi-thermal. 

We note here that, with the densities of states as in (34), the energy probability density Ps{e), which we recall by 



( 10 1 is given in general by 



P S (e) = —a s (e)a B (E - e), 



will, with M as in ( 35 ) and us an d ctb as in ( 34 ) , reduce to 



P ^=E- 



(37) 



See Figure [3j Of course (cf. the paragraph after Equation ( |32| ) the energies of S and B will, again, be perfectly 
anticorrelated. 

Similar results to the the a bove results for S will obviously hold for B. We thus conclude, in fulfillment of our 
promise (cf. the start of Section I C ) that, with the appropriate meaning in each case for the expression "approximately 



thermal", as above, when the densities of states of S and B take the exponential form of (34) then - in contrast to 
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the situation for power-law densities of states discussed in Section |TT] - on both the traditional total microcanonical 
state approach and also on the modern total pure state approach, the reduced density operators of both S and B will 
be approximately thermal (at inverse temperature j3 = b) in an appropriate sense. 



IV. GENERAL FORMULAE FOR MEAN ENERGY AND FOR ENTROPY 



It is interesting (especially in preparation for the discussion in Section IX and in our companion papers [22, 23 about 
the connection with quantum black hole physics both of the results in Sections |III| and |V| concerning densities of states 
which grow exponentially with energy and, in Section |VI[ concerning those which grow as quadratic exponentials in 
the energy) to calculate the mean energy, e, and also the von Neumman entropy, S, for each of the density operators, 
^microc^ ^modappiox ^ an( j a i so f or ^microc^ ^modapprox^ f ormer density operator will just give the usual mean energy 



and von Neumann entropy (defined as in (16)) of our system, S, when the totem is in the microcanonical ensemble. 
The latter density operator will, according to our proposition in Section |I D\ give an energy value and an entropy 
value which will very probably be very close to the mean value of the energy and the entropy of our system, S, when 
the totem is in a random pure state. 



By (111, (151, and ([7]), we have, in general, that, with obvious notation, 

E 

-microc . = tr(p Woc ffs) = M ~l £ tn ^e)n B {E - e). 

Similarly 

E 

-microc . = t^microc^) = M ~l £ enB ( e ) ns (E - e) 
and one easily sees that necessarily, eg 110 ' 00 = E — g™ Icroc . Moreover, we have 



-modapprox , / modapprox TT \ /oo\ 

£g w =tr(p s pp H s ) (38) 



M- 1 [Y,en B {E~e)n s (e)+ zn s (e)n B (E - e) 



(39) 



e=E c +A 



which is easily seen to be equal to e™ lcroc . Similarly, e ™ oda PP rox = 6 
On the other hand, by (111 and (16), we will have 



fierce . = _ tr(/0 microc l og fierce) = _ M _1 £ n ^ e)nB { E - e) logiM^n^E - e)) 



(40) 



and by ( 15 ) and ( 16 ) 



-finodapprox 



j / modapprox i modapprox \\ 

MPs l °SPs )) 



-M 1 I V n s (e)7j B (£- 



e) log(M- 1 n B (^-e)) + 



E 

£ 

e=E c +A 



n s (e)n B (E - e) log(M- 1 n s (e)) 



(41) 



and similarly with S replaced by B. We remark that it is not difficult to see from (151 and the counterpart equation 



for p™ oda PP rox that < 5 , ™ oda PP rox w [\\ necessarily equal g™ oda PP rox ^ This is of course consistent with the fact that, by 
the general result recalled in Endnote [48], for any pure totem state, \& , we necessarily have that the von Neumann 
entropies of the resulting reduced d ensi ty operators p™ odcrn and p™ odcrn will necessarily be equal. After all, as we 

and argue in Part 2, for random \&, p™ oda PP rox mos t probably gives a very 



claim in our Proposition in Section 
good approximation of pg lodorn and p B 



ID 



modapprox 



modern 
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By referring to the second equality in (10), it is easy to see that the formulae for fierce and 5<™ oda PP rox m ffl 



and (41 1 may be rearranged to give the following useful alternative expressions: 

E 

<?™ = A^P s ( e )log 



«s(e) 



(42) 



S. 



modapprox 



A £p s (e)log 



\e=A 



nsje) 



E P s(e)log 



t=E c +A 



n B {E 



(43) 



Referring to G 
that (pi and 



and making the replacement ^ (and with the proviso made in the cautionary remark in |14j ) we see 
431) have, as their continuum versions: 



rrmodapprox 



nmicroc 



Ps (e) log 



P S (e)log 
o"s(e) 



fs(e) 



E 

E. 



de, 



Ps(e) log 



fa B {E-e) 



Ps(e) 



de. 



(44) 
(45) 



microc 
S 



We notice, in passing, that the absence of the quantity A (or of any quantity that scales with A) in the formulae S, 
and 5 , ™ oda PP rox shows us the interesting fact that (for A in an appropriate not-too-large and not-too-small range, and 
to what, in typical applications will be an extremely good approximation) neither of these entropies depends on A! 

Finally, further useful insight concerning the form of Equations ( 42 ) and ( 43 1 can be had by noticing that they can 
alternatively be derived as corollaries of the following easily proved Lemma, which we will also need to refer to in 
Section IXlril in Part 2. 

Lemma: Given density operators pi, p%, . . . on Hilbert spaces Hi, H2, ■ ■ ■ respectively, with von Neumann entropies 
S(pi), S(p2), ■ ■ ■ and given positive real numbers A l5 A2, . . . with J^. A,; = 1. Then the density operator 

p = Xipi ffi A 2 p2 ® ... 

on the direct sum Hilbert space H — Hi® H2 ■ ■ ■ will have an entropy, S, given by 

5 = E A ^(^)^E All °S A - ( 46 ) 



To apply this lemma to the calculation of S™ lcIOC and 5'™ oda PP rox (f or general densities of states) we first notice 
that (11) and (15) can be rewritten as 

micro _ g^E \ 

and 



where 



and 



modapprox _ E c a , ffi E X ~ 

PS ~ W e=A A ePe + ^e=E c +A 



«s(e) 
i=l 

n B (E-ej 
i=l 



(47) 
(48) 



(49) 



(50) 



where, \e,i) and \e, i) are as in (11) and (15), and where (recalling that the sums in |ll| and (15) are over energies, 
e, which are integral multiples of A) 



where Ps(e) is the energy probability density (10). 



We also easily see from (49) and (50) that, in general 



5(p e )=log(n s (e)) and S(p e ) = log{n B (E - e)). 



(51) 



(52) 



Equations ( 42 ) and ( 43 ) now follow by simple applications of the formula ( 46 1 or of our above lemma to ( 47 1 and 



(48). 
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FORMULAE FOR MEAN ENERGY AND ENTROPY FOR EXPONENTIALLY RISING DENSITIES 

OF STATES 



Specialising to ng and «b given by ng(e) = erg(e)A and n^(e) = <7B(e)A with its and <7b a s in (34), we have that 



-modapprox 



= £72, 



(53) 



and similarly with S replaced by B. These results for the mean energies of S and B are of course anyway obvious since 
the assumption of very weak coupling (see ^ and the subsequent paragraph) implies that the mean energies of S 
and B will add to E, while (|34|) is symmetric under the replacement of S by B. However, we remark that (53) turns 



out to remain exactly unchanged even when the densities of states are generalized so as to have different pre-factors 
cs and cb (see Endnote [55] ). (Returning to the case of equal densities of states) we caution that the mean energies 
are just that, averages; they are not in any sense 'most likely' energies. In fact, as we saw in Section III the energy 
probability density (see (371 and Figure [3]) is flat! 

It is also straightforward to calculate, using the formulae of Section [TV] that the entropies take the values 

5 microc = bE / 2 + log(aE), (54) 



S%°* appm = bE/4 + log(cE), 



(55) 



and similarly with S replaced by B. (A gain, see Endnote [2 9) for the generalization to different prefactors, cs and cb, 
in the first and second formulae of (34)). 

In particular, inserting the formulae ( 34 ) and ( 37 ) for as and Ps in ( 42 ) and ( 43 1 we obtain 



A 



fierce = J2 -(be + log(cA) - log(A/25)) 



e=A 



A 

E 



bA 



(E/A)(E/A + l) 



+ (E/A) log(cB) 



bE 



log(cE) 



(56) 



while (assuming E/A is even) 
S. 



E/2 A E A 

modapprox = £ (fe + bg(cA) - log(A/£)) + £ -(b(E e) + log( C A) - log(A/25)) 



e=A 



E/2+A 



E/2 



2j2E ibe+log{cA) ~ log{A/E)) 



£ = A 



2 A/ 5A ( E /2A)W2A + l) +(E/A)log(c£) 



bE 

IT 



\og(cE) 



(57) 



which are the formulae (54) and (55). In the calculations above, we need to recall that the sums in (11) and (15), and 



hence also in the direct sums in (47) and (48) and in the above equations, are over e values which are positive- integer 
multiples of A. 



We remark that the leading behaviour of Sg ucmc (54) (i.e. the term, bE/2, which remains when we ignore the 
logarithmic terms in (54)) arises, in (say) the continuum version, (44), of our general formula for 5 , " 1 ' cro c by replacing 
the logarithm in this formula by its 'main part', by which we mean the exponent, be, in the formula, (34) cs( e )_= ce be . 
Similarly, the leading behaviour of ( 5 , ™ oda PP rox ([ e ^ the term bE/A in (55 1) arises by setting E c = E/2 in (45) and 
noticing that the main parts of the two logarithms in this formula are (in order) be and b(E/2 — e). 
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FIG. 4: Plot of the energy probability density (60 1, Ps(e), in the case S and B have the same density of states <r(e) 
for the (unrealistically small) value qE 2 — 10. 



Ke q 



VI. DENSITIES OF STATES WHICH GROW AS QUADRATIC EXPONENTIALS 

Next we discuss the behaviour of the reduced density operators, p^ lcIOC and p™ oda PP rox ; w hen the densities of states 
of S and B each increase as the exponential of a constant times the square of the energy. We shall confine our interest 

2 

to the case where both densities of states, as and cjb, behave as Ke qe with the same constants, K and q: 



and shall just discuss the cases of pg 
We shall assume that 



<rs(e) = Ke" e 
Ps 



a B (e)=Ke q * 



(58) 



mirmc an( ^ ^modapprox 



those of /9g lcroc and p™ oda PP lox obviously being similar. 



qE 2 > 1. 



-2qe(E~e) 



The energy probability density, -Ps(e) (10 1, now takes the form 

and we sketch its graph in Figure |4] 

We note that it is symmetric about E/2 and also, in view of (59), Ps( e ) is very close to zero except when e is close 



(59) 



(60) 



to or to E, where it is well approximated by the exponentially decaying function jjK 2 e qE e 2qEe (near e = 0), 
and by the exponentially rising function -^K 2 e qE2 e 2qE ^~ E ^ (near e = E). Approximating the integral from to E 
of Ps(e) by the sum of the (equal) integrals (from to oo and from — oo to E) of these exponential approximations, 
and demanding that the result must equal 1 we see that Ps(e) will be well approximated on its domain [0, E] by 

P s (e) ~ qE{e~ 2qEt + e 2qE ^ E ^) (61) 

from which we infer that the normalization constant, M , will be well approximated |42j by 

e qE 2 

M~K 2 A. (62) 

qE y ' 
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FIG. 5: Comparison figure for the (non-normalizable!) "energy probability density" (641 of the "thermal state", Pg^ bbs (e) at 
inverse temperature /3 = 2qE, again for the value qE 2 — 10 



We pause here to record that (cf. (53)) either by calculation or by the above- noted symmetry of -Ps(e)j we will 



obviously have that the mean energy of both S and E will be E/2: 



-microc -modapprox 

e s — e S 



E/2, 



(63) 



However, (cf. our remark after Equation (53)) even more emphatically than in the case of exponentially rising 
densities of states , these are not 'most likely energies'. In fact, the energy probability density, Ps(t), tells us that the 
energy (say of S) will be highly likely, and with equal likelikhoods, either to be close to or to be close to E - and 
highly unlikely to be close to E/2 (and similarly for B). In addition of course (cf. after Equation (32) and Equation 
(37)) the energy of S and the energy of B will be perfectly anticorrelated. So when the energy of S is near 0, the 



energy of B will be near E, and when the energy of S is near E, the energy of B will be near 0. 

In analogy to what we did in Sections III Vj we next wish to compare the formulae (60) and (61 1 for Ps(e) with 



(64) 



where C is a suitable constant, which, but for the fact that it is not integrable on the interval (0, oo) (for any value 
of /3\) would deserve to be called 'the energy probability density of a thermal state at inverse temperature /?' (cf. 



(17)) for the same density of states (58). If f3E S> 1, then, on the interval [0,/3/q], Pg ^ bbs (e) will be very close to zero 



except when e is close to or to /3/q, where it will be well approximated by the exponential decay Ce~" e (near 0), 
and by the exponentially rising function Ce^ e ~^l^ (near e = /3/q) 
will take the approximate form 



Put otherwise, on the interval [0,/3/q], Pgg' bs (e) 



pGibbs/ \ 



CK(e 



-06 



(65) 



Beyond e = /3/q, fg 3 ^ bbs (e) will, of course, grow rapidly. If we now choose to make the identification 



/3 = 2qE, 



(66) 
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we see that ( 65 ) can be written 



pGibbs/ \ 



(67) 



while (61) can be written 



P s (e) ~ qE(e 



-fie , e f3(e-E) 



(68) 



and comparison of ( 67 ) and ( 68 ) or a glance at Figures [4] and [5] immediately shows that these resemble one-another 
closely - each having an equally-rapidly exponentially decaying peak located near e = and each having an equally 
rapidly exponentially rising, equal-sized second peak - the only discrepancy being that, in the case of Ps(e), the 
second peak is located near e = E while, in the case of Pg^ bbs (e), it is located near e = 2E. Thus, at low energies - 
and indeed at all energies, e, up to a little below E - the energy probability density, Ps(e), is closely approximated 
by the thermal energy probability density for f3 — 2qE, while there is a qualitative resemblance between Ps(e) on its 
full interval [0, E] and Iga (e) on the interval [0,215] (with the above-mentioned quantitative discrepancy that the 
second peak in Ps(e) occurs near E while the second peak in P3 ^ bbs (e) occurs near 2E). 

Moreover, one may easily check that, except for discrepancies correspondi ng t o the above discrepancy for the e nerg y 
probability densities, the reduced density operators, p™ 10 ™ (defined as in (11)) and /9 ™ oda PP rox (defined as in (15)), 



respectively, will stand in relation to Pg 1 ^ 3 (denned as in (13)) for (3 = 2qE. in a similar way to the relationsmps 



which we termed '^-approximately thermal' and l E- approximately semi-thermal' in Section [III| 

In conclusion, except for the discrepancy pointed out above, we may say that, in contrast again to the situation 
for power-law densities of states and with many similarities (but also a few differences) to what we found in Sections 
III Vj for densities of states which grow exponentially with energy, also densities of states which grow, (58), as 



quadratic exponentials lead to reduced density operators on S and B which are, in the sense we have explained above, 
approximately thermal. 

Next we turn to calculate the von Neumann entropies of /?g llcroc and / r,™ oda PP rox w hen the densities of states are as 
in ((58]). 

In the spirit of the last paragraph of Section [v] we expect the leading term in S™ 101 ' 00 to be given by 



Ps(e)qe 2 de 



2 



(69) 



where, for the first approximate equal ity, we have replaced the logarithm in (44) by its 'main part' - i.e. by the 
exponent, Ke 2 , in the first equation in (58), and, for the second approximate equality, we have used the fact that the 



energy probability density, Ps(e) (see (61 ) and Figure [1]) consists of two sharp peaks, each of area 1/2, one located at 
and one at e = E. 



Proceeding similarly for S s 



modapprox 



we similarly expect the leading term to be given by approximating ( 45 1 by 



nmodapprox 



f 1 P s (e)qe 2 de + f P s (e) q(E - e) 2 de 

JO JE/2 



0. 



(70) 



It is straightforward to check that the error in both (69) and 



70| is only of order 1 in E; one needs only to be 
careful to realize that this is one situation in which (cf. Endnote |14| ) it is important to work with the discrete sum 
versions, (40) and (41 ) or alternatively (42) and (|43l, of our entropy formulae; if one were to work unthinkingly with 



(44) and (45), one might (wrongly) conclude there is a (for some values of K, q and E, negative!) correction to both 
equations of form \og(K / qE) + O(l) - the problem being caused by the steeply rising behaviour of Ps(e) near e = 
and e = E. 

Thus, for our densities of states which grow as quadratic exponentials, there is an even more dramatic difference 
between the value of 5'™ lcroc and the value of 5 f ™ oda PP IOX than we found, in Section |vj for densities of states which 
grow exponentially with energy (where they differed by a factor of 2). 



VII. MORE ABOUT ENTROPY 



Note: The reader may wish to skip this, and the next, section on a first reading and go directly to Section |IX| 
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A. Special facts about the entropy when the energy probability density is sharply peaked 

In the case of our power-law density of states example, an alternative way of arguing that the location of the peak 



of the energy probability density, -Ps(e), of the system, S, is given by the formula (31 1 is to assume foreknowledge of 



the existence of a (single) peak in the energy probability density, Ps(e), in the interior of the energy-interval [0,E] 

:h 

= (71) 



and then to note that, by (10), this must occur at an energy, e for which 

dlogcr s (e) dloga B {E-e 



de de 



which easily implies (31 



In a popular approac 



(cf. also |43| ) to such problems involving the microcanonical ensemble of a pair of weakly 
coupled systems with such a (say unique, interior) peak, such a calculation often appears in the following guise: 

One writes e = e\ and E — e = e 2 . One calls logos(e) ll S\{e\f\ and one calls \oga B {E — e) "S^e^" . Then one 
writes the equations 

£i+e 2 = E, 

dSi dS 2 

— = 0, 

dei e 2 



(equivalent to ( 71 ) and 



(expressing the fact that it is a peak and not a trough) . 

It is often then assumed, or, at least, tacitly implied, that S(e±) and S(e 2 ) are the "entropies" of Systems 1 and 
2 (our systems S and 'energy bath' B) and that dSi/dei and dS 2 /de 2 are the "temperatures" of Systems 1 and 2. 



Finally, the equation ( 72 ) is interpreted as telling us that Systems 1 and 2 are in "stable equilibrium" . 
Concerning this popular approach, we would remark and emphasize: 

(a) 5(ei) and S(e 2 ) (our logcrs(e) and logas(E — e)) are not entropies (they are logarithms of densities of states). 
To make sense of the logarithms one would, at least, need to multiply os(e) and a&{E — e) by "constants" with the 
dimensions of energy, first, to make the overall arguments of the logarithms dimensionless. This may not matter if the 
resulting logarithm is anyway destined to be differentiated with respect to e to define a 'temperature' (see Paragraph 
(b) below) . However it will matter if one wishes to talk meaningfully about the logarithms themselves (evaluated at 
the peak values of e and E — e) as 'entropies'. One could, of course, insert, in each logarithm, an arbitrary constant 
with the dimensions of energy, and try to argue that it doesn't make much difference, in practice, what is the precise 
value of this constant provided it is of a "reasonable" order of magnitude. But, even if it were the case that all that 
was at stake was such a "constant", one would expect, in a fundamental understanding of the origin of entropy, its 
value to be determined in terms of the physical parameters of the problem. In fact, as we shall see below, what 
actually needs to be inserted is not a constant, but rather (for given system and energy-bath densities of states) a 
quantity (which we call Q below) with the dimensions of energy which (like the peak values of e and E — e themselves) 
depends on the totem energy, E. 

(b) It is true that one can think of dSi/dei and dS 2 /de 2 as 'energy-dependent temperatures' in the sense (cf. Section 
|I A| and Endnote pQ) that, if System 1 had energy e\ and were uncoupled to System 2, but, rather, coupled to another 
and much smaller system, then that smaller system would likely get itself into a thermal state at the temperature 
dSi/dei evaluated at e\ (and similarly for System 2). However, in the 'equilibrium' in question, where System 1 and 
System 2 are coupled to one-another and neither can be regarded as 'small', neither System 1 nor System 2 is in a 
thermal state (as we have shown in Section [n] for our power-law case)! 

(c) Finally, this 'popular' point of view is only of value in cases where the energy probability density (say of System 
1) has a peak. Whereas, we emphasize, as explained in this paper, one still predicts definite energy probability density 
functions when System 1 and System 2 have densities of states (such as our equal exponential and equal quadratic 



exponential cases discussed in Sections III and VI I which do not lead to an interior peak. (In the equal exponential 
case, we find an energy probability density which is flat, and, in the quadratic exponential case, it is concave with 
peaks at the extremities of the range [0, E] which are not 'maxima' in the sense of the above equations for Si and 
S 2 .) In such latter cases, the significant quantity of interest is not the location of a peak (there may even be no peak) 
but rather the full energy probability density function itself. 
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We turn next to the value of the entropy of the system in cases where, for given system and energy-bath densities 
of states, and given total energy, E, the energy probability density, Ps{c), has a single peak, say at e = e . We are, 
of course, interested in both of the entropies, 5'™ lcroc and 5 , ™ oda PP rox > As we anticipated in Point (a) above, when 
the totem is in the microcanonical ensemble with energy in our interval [E,E + A], 5™ lcroc w iH be log(Qos(eo)) 
for a suitable quantity, Q, with the dimensions of energy determined by the parameters of the problem, although 
we emphasize again that Q is not a "constant"; rather (for given system and energy-bath densities of states) it is 
a certain function of totem energy, E, which has the dimensions of energy and which is determined by the detailed 



shape of the peak in Ps(e). In fact, by the general formula (44) (one can see that the issues mentioned in Endnote 



[14j will not be relevant for a sharp peak which is well inside the interior of the interval [0,-E]) we will have 

Sf c ™= [ E P s (e)\og(La s (e))de- f P s (e) \og(LP s (e))de 
Jo Jo 

where we have temporarily introduced an arbitrary non-zero constant, L, with the dimensions of energy, which will 
of course cancel out in the final result. 

Since Ps(e) is, by assumption, sharply peaked at e = eo, and assuming a(e) is relatively slowly varying (as will be 
true in typical examples such as the power-law density of states case treated below) the value of the first integral will 
be very well approximated by log(L<7s(eo))- The second integral will obviously take the form log(L/Q) where Q is a 
quantity with the dimensions of energy which can in principle be computed in terms of our system and energy-bath 
densities of states and the value of E. So we will have 

Sf"'° c = log(Qa s (e )). (73) 
On the other hand, if we consid er t he totem to be in a pure state, randomly chosen amongst all states with energy in 



the range [E, E + A], then, by (45 1, we expect the entropy to most probably be very close to g™ a PP rox given by 



^nodapprox = Pg (e) log(i CTs (e))& + / * P s (e) log(i<7 B (E - e))de - f P s (e) log(LP s (e))de 

Jo J E r Jo 



which, by a similar argument, and in view of the definition of E c (see after Equation ( 15 )), will be very well approxi- 
mated by 

^odapprox = min(log(QCTs(eo))j \ g(Qa B (E - e )) 

for the same value of Q. 

We next illustrate the computation of Q in the case where both of system, S, and energy bath, B, have the power 
law densities of states (18) which we discussed in Section [nj We have 



log(L/Q)= [ P s (e)\og(LP s (e))de 
Jo 



where -Ps( e ) is given by (30) with 7 as in (32). As long as N$ and N-q are comparable in size, the location, (31), of 
the peak, eo, will be far from the extremities of the interval [0, E] and we may clearly replace the limits of the above 
integral by —00 and 00 with very little error. 
So 




The second term above may easily be calculated by writing it as 

-(7 3 / 2 /P7r 1 /2)a/a 7 / ( ^exp (-T 11 ^) de. This is equal to ( 7 3 / 2 / 'Eir^^d / 'dilEi^t^-V*) = 1/2. So 



log(L/Q) ~ - 
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or, equivalently, 



Q 




(74) 



For the purposes of the comparison we make in Section [TTJ we also calculate Q for a the rm al state at some given 
inverse temperature, j3, of a system, S, with a power law density of states <r(e) = Ae N . By (27) we will now have 



log(£/Q) 



/3 



AT+l 



iV! 



log 



= - log ( ^ - f°°(Nlog(pe) - Pe)e N e~^de. 



N\ 



We may do the integral here by noticing that J Q x™loga;e ba: da; = d/da J Q 



00 ™a„— bx 



X e 



dx| c 



d/da(6-( a+1 )r(a 



l))| Q=n = -(log6)6-( Q+1 )r(a + l) + 6^( Q+1 )d/dar(a + l))| Q= „ = -(log 6)6-(™ +1 )n! + &-(" +1 )r(n + l)i/>(n + 1) where 
r denotes the gamma function and ip(n + 1) the digamma function (see e.g. [30]) of n + 1 which (see again [30]) is 
equal to X)fe=i 1/k — C where C is Euler's constant (= 0.5772 . . . ). Using this, we conclude that log(L/Q) = 

- log(L/3) + log N\ - Nip(N + 1) + N + 1 

which, using Stirling's approximation (which tells us that log n\ = (n+1/2) log n— n+(l/2) log(27r)+l/(12n)+0(l/n 2 )) 
and the asymptotic expansion of tp(n + 1) (= logn + l/(2n) + 0(l/n 2 )) is equal to 



log(L/3) + l - log TV + l - log(27r) + * + 0(l/iV) 



from which we conclude that 



Q 



V2eirN 



(75) 



(76) 



We note that if we identify N here with iV s and if N B > N s , then, by (32), 7 ~ N^/2N S . If, additionally, we take 
P in |76| to be dlog_CTB_(e)/de| £=B which (with cr B (e) = Ae NB ) equals iVB/^, then the Q of Q and (j76| both take 
the same value y/2eirNsE/NB- This agreement is to be expected since, as we discussed in Sections |l]and|nj in this 
situation, where S is small and in the case of a total microcanonical state, the reduced density operator of S will be 
close to that of a thermal state at inverse temperature c?logCTB(e)/de|£=£;. So the agreement of the two Q in this 



regime serves as a check on the correctness of our two formulae ( 74 ) and ( 76 ) . 

However, in Section|ll]we were interested in comparing the properties of the (as we show there) non-thermal reduced 
state of S when Ng and 2Vb are of comparable size and the totem is in a microcanonical state with the properties 
of a thermal state of S with the same expected energy. Treating, for simplicity, the case where N§ = Nb = N, 
say, the relevant inverse temperature, /?, is 2(iV + 1)/E (~ 2N/E for large N) and 7 (32) is AN. We then find 



that the 'thermal' Q (Equation (76)) becomes \J en/2NE whereas the 'microcanonical' Q (Equation (74)) becomes 
(l/2)y/en/NE. Thus the entropy of the comparison thermal state of S is bigger than the entropy, S 1 ™ 101 ' 00 , of the 
reduced state of S by log 2/2. While this is a small difference it is conceptually significant that it is not zero. 



B. On the entropy of the totem and more about A 

Our general framework involves a system, S, and an energy bath, B, comprising a totem. A natural question is: 
What is the relationship between the entropy of S, the entropy of B, and the entropy of the totem? The answer to 
this question depends, first of all, on whether we are contemplating the, traditional, microcanonical, scenario, or the 
modern scenario in which the state of the totem is pure - albeit chosen at random amongst the set of states in our 
energy range. In the latter, modern, scenario, there is only one entropy: As we mentioned in Sections |I E| and |IV[ the 
(von Neumann) entropy of S is equal to the (von Neumann) entropy of B and both are equal to the {system}-{energy 
bath} entanglement entropy of the totem; the von Neumann entropy of the totem is of course zero. 

In the microcanonical scenario, one might, naively, expect the entropy of the totem to be the sum of the entropy of 
S and the entropy of B. But, as we shall see, this is not true. One way to see that it cannot be true is to notice (see 
again below) that the entropy of the totem (which will obviously have the value logM, M as in (J3J) and Q) - below 
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we shall call this S^t™ 



depends on the width, A, of our energy band [E, E + A] whereas, as we saw in Section 
IV (for a suitable range of A) the entropies of S and B do not. What is of course always true (and applies to both 
modern and microcanonical scenarios) is the property of subadditivity |31) which guarantees that the the entropy of 
the totem, must be less than or equal to the sum of the entropies of S and B. 

Actually, it turns out, in all three cases we have studied here (i.e. with system and energy bath densities of states 
of power-law form and [equal] exponential or quadratic exponential form) that the sum of the entropies of S and B 
is close to the entropy of the totem. We have not attempted to formulate any general precise statement of what we 
mean by this, nor have we attempted to offer any general explanation as to why this should be the case but content 
ourselves simply with making the content of this statement manifest for each of our three density-of-states models: 
For our (equal) exponential densities of states (34) we notice that, by ([35]) 



= log M = bE + \og(c 2 EA) 



whereas, by (54) 



nil 



2 F 2N 



= bE + log^E 



For our (equal) quadratic exponential densities of states ( 58 ) , by ( 62 ) 

'K 2 A 



= logM = qE z + log 



2E 



whereas (see (69) and the paragraph after (70)) 
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qE 2 + 0{l) 



For our power-law de nsiti es of states (18), we first obtain a good approximate formula for M by noting that the 

£o) Nb (eo as in 

(3D) to its maximum val ue when normalized, which, by our Gaussian approximation 



(1/E)iJj/E, 7 as in (32). Thus we have (to a very good approximation) 



(30) 



f(E 

is well-approximated by 
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^ totem 



logM = log ( A s A B EAJ-e£ s (E - e 



whereas, by (73) for S and its obvious counterpart for B and (74), 
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We see that subadditivity in the exponential case entails A < E. In the quadratic exponential case, (and neglecting 
the O(l) term) it entails A < 2E/K 2 , and in the power-law case, it entails A < e^ir/jE. When TVs = Nb = N, 
say, 7 = 47V and this latter inequality amounts to A < [e^fi: /2)(E / \f~N) . The first of these inequalities (A < E) is 
obviously consistent with almost any sort of smallness assumption on A. The other two inequalities indicate a need 
to be more precise than we were, in our rather sketchy remarks in Section [I] and in our subsequent derivations, about 
what is the appropriate range of A for any given pair of densities of states, eg and <tbi hi order for our arguments and 
approximations to be valid. We shall, however, not pursue this issue further in the present paper except to deduce 
that the above inequalities must be necessary conditions on the value of A. 



VIII. MORE ABOUT THERMALITY: PURIFICATION 



Throughout the preceding sections we have assumed (see the paragraph after Equation (J2J) ) that both our system, 
S, and our energy bath, B, have densities of states which are positively supported (i.e. the Hamiltonians Hg and Hb 
are positive) and monotonically increasing and we have been concerned exclusively with totem states which are (close 
to) stationary states for a totem Hamiltonian, ([2]), which weakly couples S and B. In this short section, we point 
out that the prospects for the thermality of either S or B become much less constrained if we relax some of these 
assumptions. In particular, and in the spirit of the 'modern' approach, given any system density of states, crs(e), 
whatsoever (provided only it grows sufficiently slowly for the desired thermal state to exist) one can always find an 
energy bath density of states and a pure totem state such that the reduced state of the system is an exactly thermal 
state at any given temperature. 
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In fact, there is a well-known procedure, in the spirit of the modern approach, known as 'purification' (see e.g. [33] 
and references therein; see also the papers on 'thermofield dynamics' [34] and [35] which are based on the same idea 
- we note that the term 'purification' seems to be due to Powers and St0rmer [33]) by which a system, S, with any 
density of states whatsoever (but we shall assume it to be positively supported and monotonically increasing) and in 
any non-pure state one wishes to prescribe (but we are interested in a thermal state at some inverse temperature /3) 
may be provided with a notional energy bath, B, such that there is a choice of pure state on the resulting totem for 
which the reduced density operator on S is equal to the the prescribed state. 

The essential idea of purification is based on the fact that any density operator, p, on a Hilbert space, H, takes the 
form 

P = ^2pi\^i)(^i\, 

i 

the pi being positive numbers which sum to 1 and ipi,ip2i ■ ■ ■ being an orthonormal basis for %, and on the observation 
that this can be viewed as arising as the partial trace over the second copy of W, in the tensor product H<EiH, of the 
pure density operator, |4 r )(^'|, where 



The easiest way to see this is to notice that, for any linear operator, A, on H, 

(y\(A®I)y) n ®H = te(pA)u- 

If we now specialize to the thermal situation where the ipi are the energy eigenstates of a Hamiltonian, H, on % with 
energy eigenvalues say e, and pi — Z~ l e~^ ei and identify H. both with the system Hilbert space, Hs, and also with 
the Hilbert space, "He, of our notional energy bath, then, if we also take the energy bath Hamiltonian to equal H 
(and therefore the energy levels of the energy bath to be the same as the energy levels of the system) then 'J provides 
us with a pure totem state with the property that the reduced state of the system (and also the reduced state of the 



energy bath) is exactly thermal. With the notation of Section I A we would write Hb — T~Ls and take \& on "Hg eg) Hb 
to be given by 

oo ns(e) 

^ = Z sTH^ M2 E M®lM>. (77) 

e=A i=l 



Then the partial trace of !*)(*! over Ji s will equal the Pgp of Equation (13) 



This achievement of exact thcrmality contrasts with the situation discussed in Section ID (see in and after the 



paragraph containing Equation (17)) where (on the 'modern approach') the totem state is assumed to be randomly 
chosen from amongst totem states in a narrow range of energies for a weakly coupled total Hamiltonian. As we 
saw there and in the rest of Part 1, with that assumption, and for systems of comparable size, thermality can 
never be achieved exactly and can only be approximately achieved for certain special densities of states - such as, 
in particular, the exponential and quadratic exponential cases discussed in Sections |III| and |VI| However, in the 
purification mechanism described here, the state, ^ , of the totem is - say if we regard the totem Hamiltonian to be 
given by Equation ([2| with Hb = Hg (and, say, with no coupling term at all) - clearly not even close to an energy 
eigenstate; in other words, the totem is in a highly non-stationary state. We remark that this purification mechanism 
does not seem to play much of a role in everyday physics as a mechanism by which a system can get to be hot, 
although, interestingly, essentially this mechanism has been made use of in the laboratory [35] to produce thermal 
states of photons. (See also the remark about the Unruh effect in the next paragraph and the remarks about quantum 
black holes in Section [lX]) 



We further remark that there is an alternative reinterpretation of Equation ( 77 ) in which one ascribes to the 'energy 
bath', B, the Hamiltonian Hb = —H$ (and substitutes these Hamiltonians into the totem Hamiltonain formula ([2])). 
With this interpretation, the state of the totem is an eigenstate of totem energy (with totem energy eigenvalue zero!) 
and so a stationary state. But now the density of states of the energy bath is negatively supported! This latter 
interpretation can be said to be realized in the Unruh effect (see e.g. [3 7) and reference therein) whereby the vacuum 
state of a relativistic quantum field theory in Minkowski space, restricted to a right-Rindler wedge, is a thermal state 
with respect to Lorentz boosts; the left-Rindler wedge plays the role of our energy bath, B, and this can be thought 
of as a copy of the right-wedge quantum system but with a Hamiltonian which is the negative of the right-wedge 
Rindler Hamiltonian. 
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IX. IMPLICATIONS FOR QUANTUM BLACK HOLES 



The (problematic) connection between our results on quadratic exponential densities of states in Section 

|VI| and black hole thermodynamics 



omicroc 



There is a striking, at least superficial, resemblance between Equations ( 66 I and ( 69 ) in Section VI - i.e. j3 — 2qE and 
(= S*b lcroc ) = qE 2 /2 (up to an 0(1) correction) - for the inverse temperature and the traditional microcanonical 
entropy of a weakly coupled sy stem , S, and energy bath, B, with equal quadratic exponential densities of states (58) 
and the equations (see Section IB) j3 — %-kGM and S = AirGM 2 for the Hawking inverse temperature and entropy 
of a Schwarzschild black hole. In fact, if we identify, say (see below), the mean energy of B - i.e. (see (63)) half the 
totem mean energy, E/2 - with the black hole mass, Ai, and identify q with 2irG, the entropy, Sj$ lcTOC matches the 
Hawking entropy and the (inverse) temperatures match too. 

This might seem to suggest that, if one identifies the system, S, with 'matter' and the energy bath, B, with 'gravity', 



then the traditional microcanonical strand of Section VI may provide a good model for a black hole in contact with 
its thermal atmosphere in a box and a good explanation for the microscopic origin of the entropy of this system. (Of 
course, we must bear in mind that, in this model, the state is only thermal in the approximate sense explained in 



Section VI ) And on the other hand, our result, (70), that, with the densities of states (58), the 'modern' totem pure 



state entropy, g™ oda PP rox vanishes (up to a term of order 1 in E ) might seem to be at odds with our matter-gravity 
entanglement hypothesis described in [P7HI9] and in Section IB- which entails that the total matter-gravity state of 
a black hole is a pure state. However, we need to realize that the results of Section |VT| assume that the dynamics of 
the totem is governed by a totem Hamiltonian of the schematic form |2]) with both S and B (now to be interpreted as 
'matter' and 'gravity') Hamiltonians positive and weakly coupled. Yet, notoriously, it seems unlikely to be possible 
to have a quantum theory of gravity within the scope of these basic assumptions (albeit these assumptions do seem 
to apply to the weak-string coupling limit if the fundamental degrees of freedom are taken to be those of a string 
rather than of the gravitational field itself - see Section IX B). Already classical general relativity is nonlinear and, 
unlike the situation for energy (mass) is not additive. So the fact that the mean energies of system and energy 
bath (modelling matter and gravity) are equal in our model seems strange. (It is also unclear whether we should 
identify the entropy of the black hole with S'g llcroc as we did above, or with S 1 ™"' 00 + S™ lcmc , which is twice as big, 
or with logM - see also Section VII B ) Furthermore (see the remarks after Equation (63)) in this model, the mean 
energy, E/2, of each of matter and of gravity is anyway just the mean of an energy probability density (the i"s( e ) of 
(68) and of Figure HI) which is peaked at the extremes, e = and e = E, while of course (cf. after Equation (|63|) the 



energy probability densities of matter and gravity are perfectly anticorrelated. So the model predicts large statistical 
fluctuations, with (to the extent that it makes sense to talk about the energy of subsystems in general relativity) 
probability distributions for matter and gravity being such that, approximately with probability one half, gravity 
has all the energy and matter none, and with probability one half, matter has all the energy and gravity none. The 
latter case (where there is presumably no black hole) is then a particular problem because (see the next paragraph) 
presumably the quadratic exponential form of the density of states presupposed in the model for matter becomes 
invalid when a black hole is not present. 

In fact, turning to more specifically quantum aspects, aside from all the usual problems of quantum gravity (non- 
renormalizability etc.) it would seem to be incorrect to assume that one can ascribe a single density of states to 
each of gravity and matter throughout changes of state which include the formation of black holes. Rather it would 
seem that one has to assign, in some sense, a 'state-dependent density of states' to matter; in the absence of black 
holes, the densities of states of common forms of matter (including photons) grow much more slowly than quadratic 
exponentials, while, plausibly, when a black hole is present, they do grow as quadratic exponentials (with subleading 
corrections). (Evidence for this latter statement is provided by the 'brick wall' approach [50H52] which suggests that 
the matter entropy is comparable to the gravitational entropy when a black hole is present. We shall also argue in 
|2"5] that the string scenario we advocate in Section IX B and discuss further in [551 [53] leads, plausibly, to just such 
a state-dependent density of states for matter.) Moreover, the situation is further complicated by the very different 
status of the concept of 'time' in general relativity from that presupposed in traditional formulations of quantum 
theory. 

In the light of all these problems and difficulties, and of our current lack of knowledge as to how to resolve them (other 
than to assume that a black hole is an [ill-understood] strong string-coupling limit of a certain [better understood] 
state of string theory at weak coupling - see Section IX B ) it seems to us still reasonable to cling to our matter-gravity 



entanglement hypothesis and our entanglement picture of black hole equilibrium (see Section IB). Indeed, there would 
seem just as much reason to believe in a model along the following 'modern' lines (inspired by the idea of 'purification' 



outlined in Section VIII ) as to believe in the above model within the traditional microcanonical strand of Section VI 
A tentative possible 'modern' model with the correct Hawking value for the entropy: 'Matter' is modelled as a 
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'system', 'S', and gravity as an 'energy bath', 'B', which each have a density of states as in ([58]), and the total state 
of the matter-gravity totem which corresponds to a black hole of mass A4 = E/2 is modelled by the pure density 
operator |^}(^ r | where \P is 

1 E n s (e) 

J2VnB(E-e)J2\^h\e,i)B (78) 

M e=A f=0 



where (cf. Q, Q and Q) n B (e) = exp(<?e 2 )A and <? = 2ttG. 

This state is easily seen to have partial traces over S and B identical to those, i.e. the pg lcroc and pg llcroc of ( 11 ), of 
the microcanonical state (|6| p m i cr oc for the same densities of states (i.e. (58)) and thus, obviously, it will equally well 



predict an inverse temperature of 2qE and a system entropy (= energy bath entropy) of qE 2 /2 (plus the same O(l) 
correction) . 

While we have argued that this latter 'modern' model is no less justified than our above microcanonical model, in 
view of the difficulties and problems mentioned above, it is still quite unclear what status should be assigned to it or 
how seriously it should be taken. There is also an apparent flaw in this tentative model in that the pure state of the 
totem is far from being an energy-eigenstate. It could possibly be that this is the best one can do when one attempts 
to force a strong-coupling situation into a weak-coupling mould, or maybe the model should be modified along the 



lines of the alternative reinterpretation of Equation ( 77 1 in Section VIII so that the totem state is modelled as an 
energy eigenstate, at the expense of having densities of states which are not monotonically increasing and/or not 
positively supported. Finally, there is the same strange feature that we raised above for our microcanonical model, 
that both the gravity and the matter are modelled as having, on average, exactly half of the mass (i.e. E/2) of the 
totem. Also, as in the microcanonical model, the energy probability densities of matter and gravity are predicted to 
each have the same energy probability density (the -Ps(e) of (68) and of Figure [4| with equal-sized peaks at e = 
and e = E. Albeit, interestingly (and related to the fact that the totem state is far from being an energy eigenstate) 
this model differs from the microcanonical model in that the two energy probability densities are now no longer 



anticorrelated but, instead, perfectly correlated: One sees immediately from (78) that when gravity has energy near 
0, so will matter, and when gravity has energy near E, so will matter. So at least one of the problems we mentioned 
above (the one we referred to as a "particular problem") for the microcanoncial model seems to be alleviated in the 
above proposed 'modern' model. Another problem which is alleviated with this tentative modern model is that it is 
clear, in this modern model that the entropy should be identified with n , whereas, as we remarked above, in 

the microcanonical model it was not clear whether it should be identified with l S'g llcroc or with (approximately - see 



Section VII B ) twice this value; in the modern model, there is only one entropy - i.e. the S-B entanglement entropy 

(— cmodorn _ cmodcrn ^ rrmodapprox _ omodapproxx | 
v B S B S / ' 

One would also wish to be able to relate our discussion, in Section |VII A| of cases where the probability density is 
sharply peaked, to the results, [13], of Hawking on his microcanonical approach to quantum black holes. These latter 
results of Hawking do seem to form a physically compelling and coherent picture and one would like to understand 



whether and, if so, how, they can be reconciled with the modern strand of results in Section VII A even though they 
seem, superficially, to be more easy to understand in terms of the microcanonical strand of work there and seem, 
superficially, to be at odds with the modern strand. We hope to address this question elsewhere. Suffice it to to say 
here that, again, the difficulties and problems mentioned above are of at least equal relevance also to this issue and 
thus it seems difficult, also for this issue, to reach a fully convincing conclusion either way [Hj. 

B. Towards a better understanding of black hole entropy in terms of string entropy 

Where we have been able to make a, we think, persuasive, case for the relevance, of the 'modern' strand of ideas in 



the present paper - used in combination with the (see Section I B ) matter-gravity entanglement hypothesis of [T7l - fl"5] 
- to the understanding of black hole entropy is with a model which relates the work in Sections |III| and [V] here, 
concerning densities of states which grow exponentially with energy, to an understanding of black hole entropy based 
on the idea that black holes are strong string-coupling limits of states of weakly coupled string theory. This application 
of our work to quantum black holes seems to be more well-founded because, unlike in the situations discussed above, 
we would expect the general assumptions we made at the outset here (positive Hamiltonians, weak coupling etc.) to 
be applicable to the weak-coupling regime of string theory. 

In 1993, Susskind [38] proposed, and in 1997, Horowitz and Polchinski [40] [41], gave further evidence, of a semi- 
qualitative nature, for, the hypothesis that a (say 4-dimensional, Schwarzschild) black hole can be interpreted as the 
strong string-coupling limit of a certain state of string theory at weak coupling consisting of a (single) long string. 
These authors argued that one obtains, with this interpretation, an explanation of black hole entropy in terms of 
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known formulae, based on the 'counting of states', in string theory at low string coupling. Moreover, in related 
work on extremal, and near extremal, black holes (an important early paper was Strominger and Vafa [39,), full 
quantitative agreement was found between the results of such a string theory approach to black hole entropy (and 
other related quantities) and the previously established Hawking formulae. It was then claimed that this work, not 
only gave a microscopic explanation for black hole entropy but also, in view of the fact that string theory is a standard 
quantum mechanical theory with a unitary time evolution, that it resolved the Information Loss Puzzle (see Section 



IB I. We next wish to argue that, while we agree all this work seems to provide us with an important clue towards the 
microscopic understanding of black-hole entropy which, plausibly, may well turn out to be consistent with a resolution 
to the Information Loss Puzzle, it cannot, by itself, be regarded as a complete explanation of these things. (We shall 
expand on these arguments in two companion papers, [22] and [23].) Our point is simply that, what is actually 
calculated in the cited work (for example in [39 ) is not the entropy of a particular black hole state, but rather the 
(logarithm of the) degeneracy of a given black hole energy level. No explanation was given in the cited work as to 
why the logarithm of this degeneracy should be interpreted physically as an entropy. After all the n'th energy level 
of the textbook non-relativistic Hydrogen-atom Hamiltonian has a degeneracy of n 2 but we would not predict from 
this that a Hydrogen atom has an entropy of logn 2 ! Of course it is true that the logarithm of the degeneracy of an 
energy level is the same thing as the von Neumann entropy of the microcanonical density operator (1/d) Y)j—i 
where we denote by (i = 1 . . . d) the elements of a basis of states with the given energy. But if we were to attempt 
to interpret e.g. the Strominger Vafa results as meaning that a black hole should be modelled by such an (impure!) 
microcanoncial state, then the Information Loss Puzzle would surely return: How, in a string theoretic description 
of a dynamical process of black hole formation, can a presumably pure initial string theory state evolve into such 
an (impure) microcanonical state? (Such a microcanonical state also wouldn't fit with our picture of black holes as 
thermal states.) 

Actually, the Horowitz-Polchinski work is couched in terms, not of the degeneracy of a particular energy level of 
string theory, but rather of the (averaged out) density of states of a long string. The problem is then compounded by 
the fact that a density of states is not a dimcnsionless quantity, so it is not physically meaningful to take its logarithm 



[45] . (In fact similar remarks apply to those we make in Paragraph (a) of Section VII A ) 

Focusing on this Horowitz-Polchinski work, we shall next propose a modified version of their scenario, based on the 
modern strand of Sections |III| and |V| of the present paper, which seems to overcome the above difficulties and to offer 
the promise of a fully satisfactory explanation of black hole entropy in terms of string theory, consistent with unitarity 
and consistent with a resolution to the Information Loss Puzzle - namely with the resolution to the Information Loss 



Puzzle we proposed in [P7HT9] based on our matter-gravity entanglement hypothesis (see above and Section IB I. This 
will be further discussed in [52] and further developed in [53], whose content we indicate very briefly at the end of 
this subsection. 

We begin by briefly recalling the basic argument of Susskind, Horowitz and Polchinski [38, 40J as expounded in 41j. 
Their basic hypothesis is that, as one scales the string length scale, £, up and the string coupling constant, g, down 
from their physical values, keeping Newton's gravitational constant, G = g 2 £ 2 , fixed, a (4-dimensional) Schwarzschild 
black hole of mass A4 will turn into a long string with roughly the same energy, e = A4. The density of states of 
such a long string, in the limit of weak coupling, is known, very roughly (i.e. omitting an approximately inverse-power 
prefactor - see below) to take the exponential form 

Clong string (e) = C\ s e le (79) 

(Cis a constant with the dimensions of inverse energy of the same order of magnitude as £). 
The gist of the argument is that the 'logarithm' of this is approximately given by 

^long string = £^ 

(80) 

and they refer to this quantity as (approximately) the 'entropy of the long string at energy e'. They then argue that 
this should be equated with the entropy of a (Schwarzschild) black hole provided that one does the equating (i.e. 
during the process of scaling £ described above) when, to within an order of magnitude or so, 

£ = GM (81) 

which is roughly the 'size' of the black-hole. (Cf. the fact that the Schwarzschild radius is 2GAL) 



Combining (80) and (81 1 (and replacing e by M) they thus claim to predict that the entropy of the black hole will 
be within an order of magnitude or so of a constant times GM 2 which agrees, up to an undetermined value for the 
constant, with the Hawking value, AnGM 2 , for the entropy of a black hole. 

In our view, what one is actually entitled to say, instead, is that, the number of energy eigenstatcs of a black 



hole in a band of width A around energy e will, by (79) be £e + log(Ci s A), which, for a 'reasonable-sized' A will 



29 



be approximately the same as te. The argument in the previous two paragraphs then tells us that the number of 
energy eigenstates of a black hole in a band of width A around energy (i.e. mass) Ai will be within an undetermined 
constant, say C, of the order of 1, times GM 2 (and thus, by the way, that the density of states of a Schwarzschild 
black hole should behave roughly as a constant times exp(CGe 2 ).) However, in our view, it remains a challenge to 
explain why the logarithm of this formula for the density of states of a black hole should equal (to within an unknown 
constant, C of the order of 1) the Hawking formula for black hole entropy. 

In our attempt to meet this challenge, we first posit that the scenario in which a black hole goes over to a single 
long string should be replaced by a scenario in which an equilibrium state (i.e. energy eigenstate) consisting of a black 
hole in contact with its matter atmosphere in a suitable box (on our view described by a pure total state - see our 



'entanglement picture of black hole equilibrium' in Section I B ) with approximate total energy, E, goes over (again, as 



one scales the string length scale, £, up and the string coupling constant, g, down from their physical values, keeping 
Newton's gravitational constant, G — g 2 £ 2 fixed) to a (pure) equilibrium state, with a similar total energy, consisting 
of a single long string in contact with an atmosphere of small strings in a suitably rescaled box. 

We now assume that the density of states of the long string takes (to the same rough approximation as above) the 



form of (79) and that the density of states of the stringy atmosphere, cr s t r i n g atmosphere takes (again, to a rough 



approximation) the similar, exponential, form 

^string atmosphere (^) ^sa^ • (^^) 

If we now regard (most of) the stringy atmosphere as corresponding to 'matter' and as playing the role of our 
'system', S, and the long string as corresponding to (most of) 'gravity' and as playing the role of our energy-bath, 
'B', then it is plausible that these may be described by Hilbert spaces and Hamiltonians which, since we are at weak 
string coupling, should fall within the scope of the present paper, and in particular the dynamics should be described 
by a totem Hamiltonian of form In view of t he exponential growth of the densities of states, (79) and (82), we 



may therefore apply the formalism of Sections III and [V] (modified as explained in Endnote [55] to take into account 



the different prefactors, C sa and C\ s ). In particular, the modern strand of these sections tells us that a typical pure 
equilibrium state of our {string atmosphere}-{long string} totem with energy around E will, with a high probability, 



have an entropy very close to that given by Equation ( 55 ) with b = £ (with the modification to the logarithmic term 



given in Endnote |29j). I.e. ignoring the logarithmic term, by 

S = £E/4, 

while the (expected) energy of the long string, ei ong string will (see again Endnote [29]) be given by 

Qong string E/2 

(and, of course the mean energy of the stringy atmosphere will also be E/2 in this model). 

In parallel to the philosophy of [35] SO] |H], we now assume that, when we scale g back up and £ down, keeping 
G = g 2 £ 2 constant and keeping \I> the 'same', we can equate ei ong string with the mass, M, of the black hole when 
I = XGM, say, where X is an adjustable parameter of the order of 1. We thereby obtain the prediction S = XGM 2 /2, 
as the value for the entanglement entropy of black hole and thermal atmosphere in the 'same' (i.e. after rescaling) 
state But it is also plausible (as indicated above - see the relevant Endnote in (23] for further discussion) that this 
is approximately the same as the entanglement entropy between gravity and matter which, according to the matter- 



gravity entanglement hypothesis of [r?HT5] and Section IB is the physical entropy of the black hole. We thus predict 



that the physical entropy of our black hole is (approximately) XGM 2 /2. This agrees with the Hawking entropy of 
AttM 2 if we take X = 8tt. 

Furthermore, we showed, in Section |Hl] that both S and B will be '^-approximately semi-thermal', in the sense 
explained there, at inverse temperature /3 = £. Equating this with the inverse black hole temperature when £ = XGA4 
predicts a black hole inverse temperature of XGM which, intriguingly, agrees with the inverse Hawking temperature 
for the same value of X (i.e. 8tt). We remark that we would not have correctly predicted both Hawking temperature 
and Hawking entropy for a single value of X had we followed the traditional microcanonical, instead of the modern 



strand, of Section III nor if we had adopted the approach of [3H1 SOI SI] and defined the inverse temperature by 
f3 = d(log(<7i ong string (e))/ de. (In each case, the necessary values of X for fitting the Hawking entropy and the Hawking 
temperature would have differed by a factor of 2.) However we caution that it is not clear whether this nice feature 
of our modern model with exponential densities of states survives when (see next paragraph) we improve the model 
to include the appropriate approximately inverse-power prefactors. We discuss this further in |23j . Nevertheless, our 
main point is that a 'modern' model for black hole entropy, based on our matter-gravity entanglement hypothesis 
seems able to predict a temperature of the order of the Hawking temperature and an entropy of the order of the 
Hawking entropy. 
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The main deficiency in the above scenario is the adoption of the equations ( 79 1 and ( 82 1 for the approximate forms 
of the long-string and stringy-atmosphere densities of states. These formulae omit important (dimension-dependent) 
approximately inverse-power prefactors, and when one takes these into proper account, it turns out (with some, 
seemingly reasonable, assumptions) that the account of the origin of black hole entropy above and in ,22] needs 
significant changes and is even, in certain respects, misleading, although one arrives at similar final conclusions. The 
prefactors are also needed to explain why an equilibrium weakly coupled string state in a box consists of a single long 
string surrounded by an atmosphere of short strings as we posited above. Also, the statistical spread in energy of the 
string (and hence the predicted statistical spread in energy of the black hole) around the mean energy E/2 will be 
altered with the correct prefactors. All these matters will be discussed in our second companion paper |23] , 



Part 2: Full explanation of the formula (15) and a rgum ents for the validity of the 



proposition in Section I D 



X. THE WORK OF LUBKIN AND PAGE AND OTHER PRELIMINARIES AND OUTLINE OF THE 

REMAINDER OF PART 2 

In Section |lj we mentioned the work of Lubkin, [10] , where it is shown that a randomly chosen pure density 
operator, p mn = on the tensor-product Hilbert space, V. m ® H n , of a pair of quantum systems - H m being 

m-dimensional and Ji n being n-dimensional - will, for fixed m and n 3> m, have, with high probability, a reduced 
density operator, p™ n , on "H m , which is close to the maximally mixed density operator - with components, in any 
Hilbert space basis, diag(l/m, . . . , 1/m). We first need to recall some more details about this work as well as some 
further related developments which will be relevant throughout Part 2. 

Lubkin justified the above statement and made it precise by obtaining a result which is (easily seen to be) equivalent 
to the following exact formula for the mean value (i.e. over Haar measure on the set of unit vectors \&) (tr((p™™) 2 )), 
of (/O^™) 2 : In our notation 



(tr((/CT)> 



1 



(83) 



We shall re-derive this result of Lubkin with a somewhat different method in the next section (Section XI) since our 
full explanatioin of the formula in Equation ( 15 ) and our arguments for the validity of our proposition of Section I D 
will be closely based on it. Lubkin then gave a simple general argument which amounts to the statement that, for 
any density operator, p m , on an m-dimensional Hilbert space, whenever rn(tr(p 2 rl )) - 1 « 1, then the mean value, 
(S(p m )), of the von Neumann entropy, S(p m ), of p m will be well-approximated by 



1 



(S(p m )) ~ logm - - (m{tx(p 2 m )) - 1) 



(84) 



Applying this result to p™ n , (83) implies that whenever m <C n 



(SGO)~logm. 



1 



2(mn + 1) 



(85) 



which may also be regarded as an alternative quantitative expression of the qualitative property that, when m <C n, 



most p™ n 



must be close to maximally mixed. 



In (84) and (85) above, the von Neumann entropy is defined in the usual way as in Equation (16) 
Around 15 years later, Page [46] showed that the formula 



m 

277 



(86) 



is a good approximation (with error term of order 1/mn) whenever 1 <C m < n, and noted that this agrees with 

(85) on their common domain of validity 47J. We note here, in passing that, combining the two estimates (85) and 

(86) , we can clearly write, simply, that whenever m < n, {Sip"" 1 )) = logm — m/2n + 0(l/mn) (since, when m and 
n are both of order 1, the entropy can anyway only be of order 1) and hence, combining this result with m and n 
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interchanged with the equality of S(p r ^ 1 ) and S(p" 
on Tin - see Section XI ) 



(by which we mean the reduced density operator of p" 



(S(pD)=log(mm(m,n))- 



min(m, n) 
2max(m, n) 







(87) 



In the remainder of Part 2, we first introduce, in Section |XI| a useful coordinatization for unit vectors in an TV- 
dimensional Hilbert space in terms of which the 'Haar' measure of Section [T] takes a particularly convenient form. 
In order to prepare the ground for our subsequent generalization (see below) we then use this coordinatization to 
obtain an alternative derivation of Lubkin's result ( |83[ ). We also discuss in more detail the qualitative consequence of 
Lubkin's result concerning the 'almost maximal mixedness' of the density operator p™" when m <C n (as previously 
pointed out by Lubkin, as mentioned in Section [I]) and we also point out a related important second qualitative 
consequence concerning the nature of p™ n in the 'opposite' situation when m 3> n. Then, in Section XII| we use 
a generalization of our alternative derivation of Equation ( 83 1 as well as a suitable generalization of our argument 



for its two qualitative consequences to give the full statement of Equation ( 15 ) including an explanation of how the 



ub(E — e)-dimensional subspace of the (ns(e)-dimensional) energy-e subspace of 7is spanned by t he |e, i) depend on 
^ and also to give our argument for the validity of our proposition (stated in full in Section ID) th at p»j l odcrn (see 



XIII 



with two 



the discussion after (15)) is well approximated by the j0 1 ° oda PP rox of Equation (15). We end, in Section 
calculations which provide confirmatory evidence of the goodness of our approximation in situations such as those we 
discuss in Part 1. 



XI. A USEFUL REPRESENTATION OF HAAR MEASURE AND DETAILS ON, AND FURTHER 

CONSEQUENCES OF, LUBKIN'S RESULT 

Let T-L be an TV-dimensional Hilbert space and let {Ei . . .E^} be an arbitrary orthonormal basis. Then, as usual, 
we coordinatize an arbitrary vector, ip € %, by the TV-tuple of complex numbers (zi, . . . , Zn) where tp = J] j z a E a . 
-0 is, of course, then a unit vector if and only if Yl a =i \ z a\ 2 — 1- So the set of normalized vectors in our Hilbert space is 
coordinatized as the unit sphere in C N . (Writing z a = x a + iy a etc. we see that this is obviously the 'same thing' as the 
real unit (2iV-l)-sphere.) Next we change to polar coordinates in each copy of C by setting z a = r a e l9a , whereupon the 
usual volume element dz\ . . . dz^ on C N takes the form r\ . . . rpjdri . . . dr^dQi . . . dQn- Changing coordinates further 
from (n, . . . ,r N ;9i, . . . ,0 N ) to (t*i, . . . , rjy-i! R\ 0\, ■ ■ ■ , 9n), where 



this volume element is easily seen to become r\ . . . r^-\Rdri . . . dr^-idRdOi . . .dO^- Next we note that, in these 
latter coordinates, the unit sphere in C N is defined by the condition R = 1. Thus the usual area element, dA, on our 
unit sphere is obtained by setting R — 1 and removing the term dR from this formula, i.e. 



dA = ri . . . rpf_idri . . . dr^_id9i . . . d6 



N- 



It is now convenient to replace the coordinates r a (a = 1, ...,N — 1) by w a (a = 1,...,N— 1) where w a = r„, 
whereupon clearly 

dA = 2 N ~ 1 dw 1 . . . d-WN-idBx . . . dO N . 



In view of the relation between the w a and the first N — 1 of the r a and the fact that the r a satisfy (88) with R = 1, 
the (iV-l)-tuple (w%, . . . ,wn-i) clearly takes values which range over the simplex defined by the inequalities < w a 
for each a-value from 1 to TV — 1, together with the inequality J2 a =i w a ^ 1< (We shall call this the standard (N-l)- 
simplex.) We remark that we can think of the quantity 1 — w a as 1 wn' for we will then have = r 2 N . So the 

latter inequality can then be expressed as < wn- The 9 a values each, of course, range over [0, 2tt). So the area of our 
unit sphere is the integral of dA over the above ranges for our variables which is 2 n ~~ 1 (2tt) n times the volume of our 
simplex. But the latter is easily seen to be 1/ (TV — 1)!. So the area of our unit sphere is 2n N / (N— 1)1 which, of course, 
is the well-known value for the surface area of the real (27V-l)-sphere. We want our Haar measure to be a probability 
measure, so we need to normalize it by dividing by this surface area. In conclusion, (up to an irrelevant set of measure 
zero) we have coordinatized the set of normalized vectors in our TV-dimensional Hilbert space by products of (TV-1)- 
tuples (u>i, . . . , wn—i) whose values range over our standard (TV-l)-simplex, with TV-tuples [Q\, . . . , 9n) whose values 
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range over the standard (i.e. with all periods equal to 2w) TV-torus, and, with this coordinatization, (normalized) Haar 
measure is simply the product 

dHaar = d(Simplex) x d(Torus) (89) 

where 

^Simplex = (N - l)\dwi ...dw N -i (90) 

and 

dTorus = (2ny N d9 1 . . . d6 N . (91) 

For later convenience, we next define, and record, the easy-to-check values of, certain integrals of certain products 
of w's over our simplex w.r.t. ^Simplex: 



Ji := J widSimplex = l/N, (92) 
Jn := J iwfdSimplex = 2/N(N + 1), (93) 

J12 := J wiw 2 dSimplex = 1/N(N + 1). (94) 

Obviously we assume here, for J\ and Jn, that ./V is at least 2, and for J12, that N is at least 3. We note that J p 
(defined as J\ but with w\ replaced by w p ) will equal J\ for any other value of p between 1 and N. Similarly (and 
with an obvious corresponding notation) J pp = Jn for any other p between 1 and N, and J qp = J12 for any pair of 
different q and p between 1 and N. We reiterate that all this holds even if q or p is equal to N, in which case, as we 
remarked above, wn is taken to mean 1 — wx — •• • — tu/v-i- 

We next use this coordinatization to compute the average, (p™ n ), of p™ n (see the paragraph before Equation 
(83) in Section [X] over Haar measure (with the result (97) below) and also to (re-)derive Lubkin's formula ( [83] ) for 
(tr((/9™") 2 )). (Here, as in Section [x] we indicate averages with respect to Haar measure with angle-brackets( ).) 
Let \& be an arbitrary unit vector in H m ® Hn and choose (arbitrary) bases, {ei, . . . , e m } for % m and {/1, . . . , /„} for 
~H n . Then we may write 

* = c a ke a ® fk (summed over a and k). (95) 

(where c a kC* ak [summed over a and k] = 1) and the reduced density operator (see Sections |T| and |x|) /5™ n on H m takes 
the form (/5™") a a|e a )(ea| (summed over a and a from 1 to m) where 

(Pm")aa = c ak c* &k (summed over k). (96) 

(The reason for the 'check' ' " ' is that we will also want, below, to talk about the (m x m) matrix whose components 
are {p r ^ n ) a a- And we call this 'p™™' to distinguish it from the operator p™ n on H m .) We want to average this over 
the unit sphere in for N = mn where each factor of C accommodates one of the TV = ran components of c a k ■ So 



we replace c ak in (96) by r ak e lSak and then by w 1 J k 2 e ieak and similarly for c^k, obtaining 



{pZ n )aa = wJkwJke^- 6 ^ (summed over k) 



and we integrate this over Haar measure (89). Integrating over the 9s first (with rfTorus (91)) will obviously give a 



factor of S a a for each k in the sum (from 1 to n) over k. We are thus left with a sum (over k) of n integrals, 

WafcdSimplcx 



for each a, each of which takes the form of J\ (92) for N = mn. So we conclude that 

nS a a 1 r 



((T 



m )aa/ — 1 
mn TO 
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and hence, obviously, 



where I m denotes the identity operator on H r , 
will have 



(PT) = -Ir, 

m 



(97) 



Similarly, denoting, by p m ™ the reduced density operator on H n we 



(P m n) = -In- 

n 



(98) 



We remark that we don't strictly need this result for the qualitative consequences of Lubkin's results we discuss below, 
but it is anyway interesting and also serves as a useful preliminary to the recalculation of Lubkin's result to which 
we will turn next - see especially the remark after equation (106). More importantly, we will need the counterpart to 
this result in our argument, below, for the closeness of pg lodern and p™ oda PP rox 
Proceeding similarly, it is straightforward to see that 



tr((p: 



mn\2\ 



C akc'ak c ai ( ^ai (summed over k, I, a and a) 



1/2 1/2 1/2 1/2 jffl 
W ak W ak W al W al e ^ 



(summed over k, I, a and a) 



We integrate this over dHaar, again doing the ^-integrals first. Clearly the latter will vanish unless either a = a or 
k = I (or both) whereupon, for fixed values of a, a, k and I, the complex exponential will integrate (with dTorus (91 )) 
to 1. Moreover, (i) If a = a and k = I, then the w-integral over the simplex will equal Jn (93) for N = mn - and 
there are mn such cases; (ii) If a =/= a and k = I, then the w-integral over the simplex will equal J12 ( |94[ ) for N = mn 
- and there are nm(m — 1) such cases; and finally (iii) If a = a and k ^= I, then the w-integral over the simplex will 
equal J12 for N = mn again, and there are n(n — l)m such cases. Thus we conclude that 



(tr((pD 2 )) 



2mn + mn{m — 1) + n[n — l)m m + n 



mn(mn + 1) 



mn + 1 



(99) 



in agreement with ( 83 ) 



Section X and as Lu 



Lubkin's result ( 83 ) /( 99 ) is important for us because of two qualitative consequences 

1 ) 2 )) 



First, as we mentioned in 
will be close to 1/m. The only 



ibkin himself essentially argued, if n m then (tr((p 
way this can happen is if most (in the sense we clarify below) totem states have reduced system density operators 
p™ n close to the maximally mixed density operator (l/m)I m , To see this, notice that (adopting the convention of 
counting each eigenvalue, A a , v times when v is its multiplicity) amongst density operators, p, on an m-dimensional 
Hilbert space, the eigenvalues of p have to satisfy both Y^a=i A = 1 (since every density operator has unit trace) 
as well as Y^ia=i = tr(p 2 ) and one easily sees from these two conditions that the minimum value of tr(p 2 ) is 1/m 
and that this minimum value is attained only when each of the A a equals 1/m. If we next consider the set of such 
p for which tr(p 2 ) is equal to 1/m + n where n denotes a (small) positive number, then one easily sees (again by 
considering the sum of the eigenvalues and the sum of their squares) that each of the A a must take the form 1 jm + 5 a 
where 53 =i = Apply m S this result to each of our reduced density operators /9™ ra , now writing the eigenvalues 
of each of these in the form 1/m + 5 a , then we immediately see that if (tr((p™") 2 )) = 1/m + n (which will hold with 
n = (m + n)/(mn + 1) — 1/m = {m 2 — l)/(mn + 1) which will be small if n 3> m) then the statement in words: 



For n 3> to, p" n 



will hold in the sense that Y^=i(^a) = V- 
Similar results will obviously hold for p r ' 



will probably be close to — /„ 

m 



(100) 



For m 3> n, p" 



will probably be close to — I n 

n 



(101) 



in a similar sense to above. 

Our second qualitative consequence of Lubkin's result for /?™ n arises as a corollary to the above statement about 
p" 1 ™'- To explain what it is, note first that, just as we had the formula (96) for the components, (/6™") a a, of the to x to 
matrix /5™ n , so we clearly have that p m ™ = (p m n)kk\f k ) (fk\ where, in the notation of (95), the n x n matrix 
given by 



(o mn ) ~ 

n)kk 



c a kC a k* (summed over a). 



(102) 
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So, denoting by C the (m x n) matrix whose components are the c a k and by C + its (n x m) adjoint matrix, we clearly 
have 



CC + and 



C+C. 



(103) 



It easily follows from ( 103]) that x G 



can be an eigenvector of /5™ n with a non-zero (positive) eigenvalue, A, if and 
) is an eigenvector of p m ™ with the same eigenvalue. (The factor of A -1 / 2 is easily 



only if y = X^/^C+xY^ 

seen to be needed if we want to ensure that y is normalized whenever x is normalized.) Moreover we note for future 
reference (in our digression on the Schmidt decomposition below) that we then have Cy* — X 1 ^ 2 x - i.e. 



CakV 



A 1 / 2 ; 



(104) 



- the left hand side being summed over k. We conclude (continuing to adopt the convention of counting any eigenvalue 
v times if it has multiplicity v) that, if m > n and if p m ™ has eigenvalues Ai, . . . , A„, then p™ n will have this same 
set of eig enval ues together with m — n more, all of which will, however, be zero! Moreover (cf. the discussion after 



equation ( 100 )), since (tr(p m ™) 2 ) = 1/n + n for rj now equal to (m + n)/(mn + 1) — 1/n = (n 2 — l)/(mn + 1) - which 



will be small if m > n - we will have A^ = 1/n + 5^ where ASi) — n. So, we may say that 



1 - 

For m^> n, p!!!™ will probably be close to — 7 |e a )(ej 



(105) 



k=l 



where {ei, . . . ,e„} is a basis for an n-dimensional subspace of T-L m (which will depend on This is the second 
qualitative consequence of Lubkin's result we promised to arrive at at the outset. As far as we are aware, it does not 
appear to have been pointed out before. But, for our purposes, it will be of equal importance to the first consequence. 
Similarly, of course: 



For n^> m, p m ™ will probably be close to 



a=l 



\fa)(fa\ 



(106) 



where . . . , f m } is a basis for an m-dimcnsional subspace of T-L n (which will again depend on \&). 

We remark that, in preparation for the argument we give below for the claim that pg lodcrn [ s well-approximated by 
p™ odapprox , it is useful to observe that/how (105) and (101) are consistent with (97) and (98). 



~H m <8> Tin ~ which we have written so far in the form ( 95 1 - can also be written as a single sum 



Further insight into the origin of (100), (101), (105) and (106) can be had by recalling that a given vector ^> £ 



4< = 



mm(m.n) 



1/2; 



fi 



(107) 



for suitable choices of basis {ei, . . . , e m } on "H m and {/1, . . . , /„} on T-L n - This is the well-known Schmidt decomposition 
(cf. e.g. [15] and/or the next paragraph) and the A^ are the same as those discussed above. (100) and (106) may 



then be viewed as (easy) consequences of the fact that, when n ^> m, the A^ in ( 107 1 are probably close to T/m for 
i = 1, . . . ,to, while they are zero for i > m. Similarly (101) and (105) may be viewed as consequences of the fact 
that, when m 3> n, the A^ in (107) are probably close to 1/n for i 



1. 



, n, while they are zero for i > n. 



The Schmidt decomposition in the form ( 107 1 can actually be derived easily from the results following Equation 
(103). In this paragraph, we digress to point out how, treating the cases where n > m: Denote by {xi, . . . ,x m } a 
complete set of orthonormal eigenvectors of p™", and denote by xf the ath component of X4. In view of the sentence 
following Equation ( 103 1, we can clearly find a complet e set, {yl, . . . , y*}, of orthonormal eigenvectors of p" 1 ^ so that, 

denoting by y* the fcth component of y* , we have, by (104), c ak yf = A - 



(the left hand side being summed over 
k) . (We only need to make sure that the i-value of every xl belonging to each non-zero eigenvalue of p™" matches the 
j-value of a y* belonging to an equal non-zero eigenvalue of p m \\ - there being necessarily an equal number of each; 
for any other yj [and of course there have to be others whenever n > m] the right hand side will anyway vanish.) 
Also introduce a new basis {ei, . . . , e m } for T-L m such that e = xfii (summed over i) a nd, similarly, introduce a new 
basis {/1 . . . /„} for % n such that fk = yj fj (summed over j). Then we have (see (95)) 



= Cak^a ® fk (summed over a and k) 



1 fj (summed over z, j, a and k) 



by p4) x) /2 xt^i 



~ 1/2 
fj (summed over i, j and a) — A 5. 



fj (summed over i and j) 
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So we have 



1/2 ~ 

^ = A, ' e-i® fi (summed over z) 



thus establishing ( 107) in cases where n> m. The cases where m> n are obviously similar. This ends our digression. 



XII. MAIN ARGUMENT FOR THE VALIDITY OF THE APPROXIMATION (15) OF p g° d °™ BY 



modapprox 

Ps 



Let us now turn to consider the set of unit vectors, VP, in the Hilbert space of a totem as specified in Section I C - 
i.e. in the subspace of states with total energies in the range [E, E + A] of Hs <8> Hb- Allowing ourselves to make the 
slight distortion explained before equations Q and ([5]), we may assume Hs has an orthonormal basis consisting of 
vectors |eg,i), where es ranges from A to E in steps of A, while, for each eg, the integer, i, ranges from 1 to ns(es); 
and similarly Hb has an orthonormal basis consisting of vectors |eB,j), where £b ranges from A to E in steps of 
A, while, for each cb, j ranges from 1 to 71b(eb)- (Below, and as in Section [IJ we shall sometimes drop the S and 
B subscripts on the es when no ambiguity can arise.) Then, we are interested in the set of unit vectors, in the 
subspace, which we shall call below Hm, of Hs ® Hb with total energy exactly E. The reason for the name Hm is 
that Hm will clearly have dimension M, where M is as in Q. 

Each such ^ e Hm is writeable in the form 

E ns(e) n B (E-e) 

* = EE E #>,i)®|25-e,i> (108) 

where we recall (see above and cf. before Equation (|6|) that the sum over e goes up in integer multiples of A. We 
also note that since "J is a unit vector, the sum (with the above indicated ranges) over e, i and j of \c l J\ 2 equals 1. 
For such a the partial trace of |W)(4 f | over Hb, i.e. the reduced density operator, ( o™ odcrn , on Hs, will then clearly 
be given by 



E n s (e)n s (e) 

xii'ni _ ^ -s* 

e=A i=l 



Ps =2. 1^ l^f, \e,i)(e,i\, (109) 

where 



n B (E-e) 

E 



We shall find it useful sometimes to think of Hs as a direct sum ®f = iyH^ (and similarly for Hb) where is spanned 
by the \e,i) for fixed e as i varies from 1 to ng(e) and we shall call the restriction of p™ 001 "' 11 to simply - its ii 

components in the basis consisting of the |e, i) being obviously the rf introduced above. 

Our aim is to give an argument in favour of the claimed correctness of the proposition, w hich we state in Section 



ID that, in situations of interest, p™ odorn will be well-approximated by the p™ oda PP rox f (15) m ^ ne course 

of giving this argument, to make clear how the ub(E — e)-dimensional subspace of the (ns (e)-dimensional) energy-e 
subspace of spanned by the |e, i) depend on $ '. (We should amplify on this statement by explaining that, when 
we say that p™ odern is well-approximated by p™ oda PP rox ; w \y a t we mean is that the values of physical quantities of 
interest, such as the mean energy and the entropy of the system S, calculated using p™ odorn will be close to the values 
of the same quantities calculated using p™ oda PP rox ) 

The main ingredients in our argument concern the average, (rf ), of rf and also the average, (tr((rf ) 2 )), of tr((rf ) 2 ), 
where both averages are taken as * ranges over the whole of Hm (with respect to Haar measure on Hm)- The 
calculations of these quantities are closely similar to the preliminary calculations we carried out above for the average 
of the density operator p'£" and the average of the trace of its square: the difference being that we are now averaging 
over all unit VP in our full M-dimensional Hilbert space Hm (with M as in |7J)) even though what we are averaging 
is only the restriction, rf , (and the trace of the square of the restriction) of p| lodcrn to Hf for fixed e. As a result, 
while the counterpart of the product, mn, in the denominator in our preliminary calculation would just be the single 
product ns (e)riB (E — e) were our average only to be over Hf <£) H E _ t , since we average over the unit vectors of the full 
Hilbert space Hm, the counterpart will be turn out to be M. Aside from this difference, to calculate (rf ) one proceeds 



3G 



very similarly to the passage, above, between equations ([96]) and (97); the reader can easily supply the details simply 
by replacing (96) by (110) etc. One clearly obtains (instead of (97)) 



r SN = n B (E-e) s 
t} M £ 



(111) 



where if denotes the identity on Hf . Similarly, proceeding as in the passage between equations ( 97 ) and ( 99 1 (but 
again it will turn out that one needs to replace rrere in the denominator by M) we easily find: 



s 2 _ 2n B (E - e)i2 S (e) + n B {E - e)re s (e)n s (e) - 1) + n B (E - e)(n B (E - e) - l)re s (e) 



<tr((r^)) = 



M(M + 1) 



n B (E - e)n s (e)(n B (E - e) + n s (e)) 
M(M + 1) 



(112) 



Now, rather as in our arguments for the two qualitative consequences of Lubkin's result, (but now our arguments 
will involve both the counterpart, ( 111[ ), to ( [97] ) as well as the counterpart, (112), to ( |99| ) we observe from (|1 12 ) 



that, whenever ng(e) ^S> n B (E — e), (tr(rf ) 2 ) will be very close to M 2 (n B (E — e)) 2 , whic h, in the presence of ( 111 I, 



easily implies (i.e. by similar reasoning to that used above in our derivation of (100) and (101)) that rf must be very 
close to M^ 1 n B (E — e) l e )*)( e i*|- More over whenever n B (E — e) >■ re s (e), (tr((rf) 2 )} will be very close to 

M- 2 {n s (e)) 2 , which, again in th e presence of ([ill]), easily implies (i.e. by similar reasoning to that used above in 
our derivation of (105) and (106)) that rf will be very close to M~ 1 no,{e)^2^[ E ~'^ 
elements of an orthonormal basis for an n B {E 



e,i)(e,i\ where \e,i) denote the 



- e)-dimensional subspace of the (res (e)-dimcnsi ona l) energy-e subspace, 
Tif of Hs which will depend on ^P. Comparing these conclusions with the form of Equation (15) we immediately see 
that, if it were the case that for all e, either ns(e) 3> n B (E — e) or n B (E — e) ^> ng(e), then ( 15 1 would obviously be a 
good approximation (at least each term in the sum over e will be) for all e. However, of course, in typical situations 
of interest, there will be a region of e values around the value E c - see the definitions of terms immediately after 
Equation ( 15 ) - where neither of these statements will hold and ns(e) and n B (E — e) will be of comparable size. (We 



remark in passing, though, that, typically, [one will be able to choose A so that] each of these quantities will be much 
greater than 1 for all or very nearly all e [which are multiples of A and] in the range [0, E].) 

Nevertheless for the sort of situations of interest to us - and, in particular, for the densities of states which increase 
according to the power law, (18), as considered in Section [il] or which increase exponentially, (34), as considered in 
Sections III and [V] or which increase as quadratic exponentials, fl58j), as considered in Section VI - and assuming 
the totem energy E and our choice of energy-increment, A (see Section [I]) are such that M ^> 1 (to ensure that the 
system [and bath] has access to a very large number of states) - one can check that the region of e-values around E c 
where n B (E — e) and res(e) are of comparable size will always be very small in size compared to E, while the sum over 
this region of ng(e)n B (E — e) will be very small compared to M. (In other words, the integral over this energy-region 
of the energy-probability density Ps{e) [see (10)] will be very much less than 1.) Moreover, as e decreases towards 
zero, or increases towards E from E c , then for all three densities of states, (18), (34), (58), one may check that the 
ratio ns(e)/n B (E — e), respectively n B (E — e)/ns(e), and hence the counterparts (i.e. with res(e) rep lacin g m and 
n B (E — e) replacing re) to the quantities which we called re before Equation (100), respectively Equation (105), will get 



rapidly smaller and hence the relevant notion of closeness (i.e. as in (100), (105)) will get rapidly stronger. It is then 
straightforward to argue from these statements that quantities of interest such as (cf. (38)) e™ odcrn — tr(pg lodcrn iJs) 
^ 2 ) itself, and (cf. pB) Sg odeIB 



tr((p 



modcri 
S 



-tr(p; 



modern 

s 



logpj? rn ) will be closely approximated (respectively) by 



-modapprox = . ^^modapprox^ ^modapprox^^ and ( rf _ (41)) ^modapprox = . _ tr ^modap P rox bg ^modapprox-j 



Concerning the latter two quantities - i.e. the trace of the square of the reduced density operator of S and its von 
Neumann entropy - there are reasons to expect the approximation of S™ odcin by i 5 , ™ oda PP lox ^ even better than 

O modern\2\ .// modapprox -.2 \ 



the approximation of tr((pg 1 ° ) ) by tr((p g ' 



T)- 



XIII. FURTHER CHECKS AND DETAILS ON THE VALIDITY OF THE APPROXIMATION ( fl5] ) 

As a partial check of various aspects of all of the above argument, and in justification of our latter remark, it is 
instructive first to consider the case where, for all e (= 0, A, 2A, . . . ) in the range [0, E], we have res(e) = 1 = n B (E — e) 
where, of course it is never true that res (e) 3> n B (E — e) or that res (e) <C n B (E — e) (nor that each of these quantities 
is very much greater than 1!) so we can think of this as one sort of 'worst case scenario'. Of course this is not an 
example that interests us in Part 1, but it would apply e.g. to a totem consisting of a pair of weakly coupled quantum 
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harmonic oscillators (with equal spring constants) for a total energy much greater than the level spacing (and a choice 
of A equal to the level spacing). For this model, we may clearly write 



* = 5> e | e )|£-e) 



(113) 



so that the reduced density operator of the system, S, will be 

E 



pr dern = £|ce| 2 |e)<e|, 



e=A 



while the formula, (151, for p™ "^? 1035 (J x S |) becomes simply 



modapprox 

Ps 



where (cf. Q) M — E/A. Clearly, (cf. (38)) the approximate mean energy. 



-modapprox < / modapprox 



1 E 

-Y 



1 ( E \ E 

(In calculating the value of the above sum, we need of course to recall that the sum is over values of e which are 
integer multiples of A.) Therefore, since this doesn't depend on the c e , its average over Haar measure (indicated with 
"( )") takes the same value: 



E 



modapprox\ 

s ) - 2 



On the other hand, the average over Haar measure of the exact mean energy, e modGrn 5 mav b e calculated as follows: 

E 



/-modcrm 



e=A 



e c e 



E 

E« 

£=A 



f (iHaar 



where the integral is over the complex Af-dimensional sphere of unit vectors in the Hilbert space, 'Hm, spanned by 
the vectors of form (113), coordinatized with w e ranging over the (M-l)-simplex and t ranging over the M-torus 
as explained at the beginning of Section XI where w e = \c e \ 2 etc. Obviously the torus factor of the integral just 



-modapprox 



gives 1, so the integral has, by (92), the value 1/M for each e. So we conclude that ( e modern ) has the same value as 



), i.e. 



/ -modern \ 



which, of course, has to be the correct value by the symmetry under the interchange of S and B in this case. 

Turning to averages over Haar measure of the trace of the square of the reduced density operator of S, we have, on 
the one hand, 



(tr((/9 rnodapprox )2)) = { £ 1 



1 



M 2 ' 



E 
c=A 
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Whereas, on the other hand, 



(where we have used (93 1 in calculating the integral) which differs from the approximate value by a factor of 2! 

However, if we turn to calculate the averages over Haar measure of the von Neumann entropies of the approximate 
and exact reduced density operator of S, we find, on the one hand, 

(5 modappro X) = ^^rnodapprox ^ ^madapprox^ = ^ QgM) = bgM (= log (£/ A )). (114) 

On the other hand, 

(5 s modcrn ) = (-tr(^ odcrn logp£ odcin )) = —M f wi log^idHaar 

■/Unit Sphere in C M 



(by Q and Q) - M(M - 1)! 



w\ \ogw\dwi . . . dwM-i 



Simplex 



= -M(M -1) [ w\ogw{l-w) 
Jo 



M-2 



dw. 



One may do this integral by noticing that w\ogw = {dw a /da)\ a= \ - obtaining for its value, d(B(a+l, M—l)) /da \ a =i 
where B denotes the beta function (see e.g. [30 ). One finds that — M(M — 1) times this simplifies (using (21)) to 
ip(l + M) — tp(2) where ip denotes the tp (or 'digamma') function defined by ip(x) — d\ogT(x)/dx, and this |30j. in 
turn, equals ^2 k=2 which, by the standard asymptotic expansion of Euler's constant, C (= 0.5772 . . . ) is equal to 
logM + C - 1 + 1/2M + 0{1/M 2 ). So we conclude that 



(S, 



modern^ 



= logM + C - 1 + 0(1/M) (= log(£;/A) +C-1 + 0(A/E)) 



(115) 



where C is Euler's constant (0.5772 . . . ) 

Comparison of (1151 and |114 | shows that the use of ( [l5| for this 'worst case scenario' leads to a von Neumann 
entropy which, for large M, is very close to the average over Haar measure of its actual value. In view of the fact that 
both of these values are very close to the maximum possible value, log M, of the entropy of any density operator on 
Hs (which is of course M-dimensional in this case) we conclude both that most totem states, for this model must 
have a reduced density operator on S whose von Neumann entropy is close to logM; and that the use of (15) leads 
to a good approximation for this value. And both of these things hold even though, as we saw above, our general 
arguments do not apply to this case and even though, for this case, as we saw above, (15) leads to a trace of the 



square of the reduced density operator of S which is only half of the average over Haar measure of its actual value. 
We will next use the Lubkin-Page asymptotic formula, (87), to obtain a result which tends to confirm the accuracy 



of our general formula (41), obtained using (15), for the von Neumann entropy for our densities of states of interest, 



( 18 1, (34 ), ( 58 ). Our result will show that the value of the von Neumann entropy obtained with ( 15 1 well- approximates 



a certain restricted average of the exact von Neumann entropy. Before we present this result, we shall find it helpful 
to first explain what we mean here by a 'restricted average' in a different context: 

Let us look back at the result essentially due to Lubkin, (99), which we (re-)obtained above, for the average over 
Haar measure on vectors, '3/, belonging to the tensor product, H m ® Hn, of two Hilbert spaces, of the trace of the 
square of the reduced density operator, p™ n on V. m - Averaging over all totem vectors, * £ H m <8> 'H™, amounts, as 
we explained above, t o a veraging with the invariant measure on the complex mn-sphere over the coefficients, c a k, in 
the basis-expansion, (95), of ^, which, in turn, writing c a k as w 1 J k e l9ak , we saw, amounts to integrating w.r.t. the 
w a k over the (mn — l)-simplex and w.r.t. the 9 a k over the mn-torus. What we now wish to point out is that, if we 
restrict to c a k which take the form (1/ ^/mn)e l0ak and just average over these (i.e. by integrating with respect to the 
9 a k over the mn-torus) one easily finds - denoting our restricted average with the symbol '[ ]' - that 



[tr((/CT) 



1 
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and this is not a bad approximation to the value, (m + n)/(mn + 1), of the full average, (tr((/9™ 11 ) 2 )), of tr((p™") 2 ) 
- the two expressions differing, in fact, only by terms of order l/mn. In terms of our geometrical picture, in which 
averaging w.r.t. Haar measure amounts to integrating over the (mn — l)-simplex times the mn-torus, in passing from 
the unrestricted average, '( )', to our restricted average, '[ ]', we have replaced the integral over the (mn — 1)- 
simplex by the value at its centroid (where w a k = l/mn for all a and k), but continue to integrate over the mn-torus. 
Of course this restricted average, '[ ]', is a basis-dependent notion, but what we have learnt is that we can choose any 
bases ({ei, . . . , e m } and {/i, . . . , f n }) we like and we obtain this reasonably good approximation to the unrestricted 
average (at least for the quantity tr((p™") 2 )). 

We conclude that the corresponding restricted set of totem vectors (cf . ([95]) , 



1 



-e l6ak e a <S> fk (summed over a and k) 



is (for any choice of bases, {ei,...,e m } and •••)/»} an d as the 9 a k range over the mn-torus) a sufficiently 
representative set of totem vectors for the restricted average over this set to indicate sufficiently well the behaviour 
of a generic totem state, <f (at least as far as tr((p™") 2 ) is concerned) . 

We shall proceed in a similar spirit, but now for the totem of Section [TC| We expect that it won't make too big a 
difference if, instead of averaging over the full set of totem vectors, ^ e %, wc consider a suitable restrict ed av erage. 
To motivate the restriction that we shall make, we notice first, that, if we expand such vectors, 'J, as in (108), then 



it follows from (111) that the (unrestricted) ave rage value (i.e. over Haar measure on the set of all W € Hm) of the 
trace of each rf (defined in the paragraph after (110)) is given by 



(tr(r e S )) = 



/n s (e) n B (E-e) 

E 



E 




n s {e)n B (E - e) 
M 



(=Ps(e)A) 



(116) 



the equality in parenthesis following from (110) 



In view of this, we take our restricted average to be over vectors, ^ £ Hm, such that, in the expansion, (1081, for 
each e, the coefficients c l J are constrained to satisfy exactly 



b(-E- 

z. E 

i=l 3=1 



W (=tr(r e b )) = 



n s {e)n B (E - e) 
M 



(We remark that, in view of what we e xplai ned in the previous two paragraphs, we could alternatively restrict much 
further and simply average over in (1081 for which every takes the form e* e « /yM and still be able to arrive 
at similar conclusions to those below. However the restriction we adopt has the advantage of allowing us to directly 
use the Lubkin-Page approximation in exactly the form ([87]).) In other words, denoting ns(e)n B (-E — e)/M by u e: 
we average over ^> g Hm which take the form ©f = oV/* ( ^£ (each ^f 1 being normalized) where we regard as 
the direct sum, €B^L ^e f ' wn ere, for each e (= 0, A, . . . ,E), Wf 1 denotes the (ns(e.)riB(E — e)-dimensional) Hilbert 
subspace of Hm spanned by the vectors |e, i)\E — e,j), i = 1 . . . ns(e), j — 1 . . . Ub{E — e) in Hf <£> H E _ e - see after 
equation (110). For such restricted 'J, / r,™ odern will take the form 



modern 

Ps 



where is the partial trace of \ over H^_ t (which will equal divided by its trace, which is u e ). Clearly, 

by the lemma in Section |IV[ we therefore have 



^modern) 




u e S(R s e ) = J2 ^S(R S £ ) - ^ lo S^ 



(117) 



= A 



But now we notice that, if we identi fy m with ns(e) and n with n B (E — e), then we can identify H^ with the Hilbert 
space, H m <8> H n , of Sections |l] and [Xl] and, under this identification, R^ is identified with p™", and S(Rf) with 
S(p m n ). Moreover, averaging S(R^) over H^f is, under (the reverse of) this identification, then obviously the same 
as taking the unrestricted average of S(p™ n ) over Haar measure on unit vectors in H m ®H n and so we may estimate 
its value using the Lubkin-Page approximation (87). Making these identifications, if we now use '[ ]' to denote 
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our restricted average over our restricted totem vectors, \&, and '( )' to denot e the unrestricted average ov er H aar 
measure on unit vectors in Hf 1 for each e, we may calculate using the formula (1171 in our lemma of Section IV 



[S( P n s 



S J2 ^ S ( R e) 



\e=A 



= J2» £ {S(R s e ))-J2 f i £ log l 



e=A 



which, recalling that fi t — ns(e)n B (E — e)/M and using (87 1, equals 

E 



E 



n s (e)n B (E - e) 



M 



\og(mm(n s (e),n B (E - e))) 



log 



/ n s (e)n B (E - e) 



M 
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1 



n s (e)n B (E - e) 



which easily simplifies to [S(pg lodorn )] 



min(ns(e),n B (-E - e)) 
2max(n s (e),n B (i? - e)) 



-M- 1 (^2n s (e)n B (E-e)\og(M- 1 n B (E-e))+ £ n s (e)n B (E - e)log(M- 1 n s (e))\ 

Ve=A e=E c +A J 



M~ 



t E c 

E 

\e=A 



ns(e) 2 
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e=B +A 



n B {E-ef 



0(1) 



(118) 



where E c is as defined after ( 15 ) 



Comparing ( 118 1 with (41 1, we notice that the first line of ( 118 ) coincides with the formula, (41 ), J 5 , " loda PP IOX f or ^he 



von Neumann entropy of p™ oda PP rox w hi cn we derived from (15). Thus we may conclude that our restricted average 
over totem vectors of 5 , ™ odern w iH be given by the formula we gave for 5 , ™ oda PP rox j n ( 41 ) - pl us an 'error term' given 
by the last line of (118). Moreover, the close agreement found above between g™ odern anc j gmodappmx ^ e ' WO rst 
case scenario' discussed above, strongly suggests that the same statement will be true for the unrestricted average. In 
order to conclude that this amounts to an independent check of the correctness of the approximate formula 5'™ odern 
of (41) for our densities of states of interest, (18), (34), (58), it remains to show that (/investigate when) the 'error 



term' (i.e. the second line in (118)) is small. To end this section we turn to this last question: 

It is in fact easy to see (after converting the sum to an integral, us ing p t) that: (a) for our power-law densities of 
states, |l8|), with Ao, — A B and iVs = A^b = say, the last line of ( |118[ ) (minus the 0(1) term) is (using Stirling's 



■,-bE\ 



approximation - see Section|n]) 1/t/wN; (b) for our (equal) exponential densities of states, (34), it is (l/bE)(l — 
and (c) for our (equal) quadratic densities of states, (58), it is (approximating the integral with the leading term of 
the asymptotic formula in Endnote [42]) exp(— qE 2 /2y These terms will all be much smaller than typical values of 
the first line of (118) provided N is large in (a), provided E ^> 1/6 (cf. Equation (36)) in (b), and provided E ^ 1/y/q 
(cf. Equation (oSiJin (c). 



So in all cases of interest here, and, no doubt, in many others too, the last line of (118) will be negligibly small 
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